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SOME NEW METHODS IN MATRIX CALCULATION ' 

By Harold Hotelling 
Columbia University 
1. Introduction 

1. The increased practical importance of matrix calculation. This paper will 
be concerned chiefly with methods of finding the inverse of a matrix, and of 
finding the latent roots and latent vectors, which are also known by a variety of 
other names associated with particular applications, such as principal axes in 
geometry and mechanics, and principal components in psychology. These two 
computational problems ai‘c of extremely wide application. The first is closely 
related to the solution of systems of linear equations, which we shall also con- 
sider. In the method of least squares the solution of tlu* normal equations is 
best carried out with the help of tlie in\'erse of the matrix of the coefli(*ients, 
since* at l(*ast some of the (‘lem(*nts of this inverse matrix ai*e needed in (*valuating 
the* results in terms of probability, a vitally neci'ssary step, and since the inverse* 
matrix is useful also in various other w'ays, such as altering the set of predictors 
used in a regression eepiation. Modern statistics also utilizes quadratic and 
bilinear forms such as the gen(*raliz(*d Stud(*nt ratio [15] for discriminating be- 
t we(*n sami)l(‘s according to multiple* variates inste*ael of one only, the associateMl 
discriminant functions [10], the closely related figurative elistaneM* of Mahalano- 
bis, Bose and Hoy [5] and the critical statistic in an investigation by Wald [28| 
e)f the efficient classification of an individual into one of two giuups. All these 
may be calculated very easily from the inveirse of a matrix of sums e)f products, 
or of covariances or e^orrelatiems, oi* from the principal conq)e)nents. Considera- 
tion of the* relatiems between twe) sets e)f variates [18] may utilize* be)th the in- 
verse of a matrix and a process resembling the* calculation of principal compe)- 
nents. Siinilai’ computatie)nal problems arise in api)lying to sets of numerous 
variat(*s the contri})utions to multivariate statistical analysis of R. A. Fisher, 
S. S. Wilks, W. G. Madow, M. A. Girshick, P. L. Hsu and M. S. Jtortlett. 
Among the non-statistical applications of the inverse matrix anei of latent roots 
and ve*ctors are problems of dynamics, both in astronomy and in airplane design 
[12], the analysis of stresses and strains in structure's [2(), 27], anel electrical 
engineering proble'ins [24]. 

Perhaps no objection to attempts at statistical inferc'nce is more common than 
that the variation of this or that relevant factor has been ignored. For example 
in dealing with time s(*ries the nee'd of allowing for tr(*nd and seasonal variation, 
perhaps by me'ans of a sequence of orthogonal i)olynomials for trend and of 

^ Revision of a paper presented at the Symposium on Numerical (Calculation held Dec. 
28, 1941 ill New York by the Institute of Mathematical Statistics and the American Sta- 
tistical Association with tlie cooperation of the Committee on Addresses in Applied Mathe- 
matics of the American Mathematical Society. For the program of the Symposium see 
the Annals of Mathematical Slatislics for March, 1942, p. 103. 
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trigonometric functions for seasonal variation, is well recognized. It is indeed 
desirable to use regression equations with a liberal number of predictors to 
eliminate spurious influences, as well as to reduce the error variance, and likewise 
in other statistical methods. But the computational difficulties in the joint 
analysis of the desired number of variables have frequently seemed too formid- 
able. We shall see how efficient techniques, in conjunction with efficient 
machines, can go far to facilitate the use of an appropriate number of variables 
by reducing the labor to modest dimensions. 

While the rise of modern multivariate statistical theory has made available 
new exact tests of hypotheses in terms of probability over a wide range of cases 
in which multiple measurements are involved, such measurements have been 
accumulating on a large scale. In many psychological, anthropometric, astro- 
nomical, meteorological and economic fields, actual measurements are available 
on numbers of variates far greater than have been regarded as amenable, within 
practical limits, to adequate treatment by the numerical methods generally 
used. In some instances the number of cases in which complete sets of these 
variates arc available is also large. The 1931 census of India included an ex- 
tensive sample in which fifty physical variates were measured for each individual. 
Karl J. Holzinger and his collaborators have worked out and circulated privately 
a complete matrix of correlations among 78 mental tests. Astronomers have 
indicated the desirability of a recalculation of the elements of the solar system 
by means of a gigantic least-scpiare solution with 150 or mon) unknowns, at the 
same time deploring the seeming impossibility of this ever being carried out. 
To apply the methods of modern theoretical statistics to derive from such 
observations all the important information th(\y (*ontain is an (‘nttaprise whose 
feasibility dc'pends on new numerical methods. 

The chief computational problems, apart from those of tabulating and provid- 
ing convenient approximations for the probability distributions, arc (1 ) the cal- 
culation of the many sums of products of pairs of p variates when p is large, and 
(2) operations on the matrices of these sums of products such as finding the 
inverse and the principal components. The first problem, which in classical 
applications of the method of least squares to long series has seemed the heavier, 
has in a sense been solved by the use of punched cards. A card is used for each 
case, and all p variates arc punched into it. By running the cards repeatedly 
through a machine wired at each run to select a particular pair of variates, 
multiply them together, and cumulate the products, this part of the work may 
be disposed of with great speed. The cost of the machines does at present limit 
the economical use of this method to rather large numbers, both of variates 
and of cases. This limit has recently been pushed upward by the introduction 
of improved multiplying calculators, with high-speed automatic multiplication 
and squaring locks. But these mechanical advances, in combination with 
recent discoveries in statistical theory, the increasingly felt need to resort to 
numerous variates, and the actual existence in many cases of data on suc.h mul- 
tiple variates, emphasize the need for rapid, economical and accurate calculations 
with matrices whose elements are sums of products. 



MATRIX CALCULATION 


3 


Modern machine methods, especially those of the punched-card type, but 
also those using machines such as the Monroe, Marchant and Frid^n, tend to 
reduce the work of formation of sums of products, in comparison with other 
operations, to such an extent as to enhance the relative value of methods in 
which such calculation of direct product-sums is important. 'I'hus products of 
matrices are much simpler to compute than inverses, and positive than negative 
powers. Indeed, powers and products of matrices can be computed by means 
of punched-card machines, and for large matrices this is doubtless the most 
efficient procedure now available*, though considerable rewiring is ne(*d(*d. Then*, 
is also a possibility, which does not seem too remote, of development of further 
devices to do this rewiring automatically. 

2. Iterative and direct methods. Partitioned matrices. In later sc'ctions 
W(i shall deal chiefly with (;(*rtain iterative^ m(*th()ds, giving particiilai* attention 
to the neglected ciuestion of limits of error in stopping at any point, and con- 
sidering the rate of approach to the desired solution. For flnding the roots of 
a matrix and the associated vectors, if the matrix has more than about four rows, 
it seems clear that an iterative method is the most economical of labor in all but 
very special cases. On the*, otlu'r hand the problems of solving syst(*ms of linear 
ecpiations and flnding the inverse of a matiix do not usually yield readily to 
iterative^ methods unless an ap})roximation to the solution is available to b(*gin 
with. This approximation is not nec(*ssarily a V(*ry close* one, but must not be* 
too wild. It may in se)m(* cases be obtaini'd fre)m a ge*neral kne)wledge of the 
subject. 

The* Mallock electrical elevie'e^ [22] is e*apable of solving almost instantane*e)usly 
te*n linear eepiations in te*n unknowns with perhaps two significant digits in each 
result, though this question of accuracy remains to be elucielateel. The com- 
binatie)!! of this device with the iterative method of Sectiem 7 be*le)w, anel with 
the use of partitioning fe)r matrie’.es of more than ten rows, offers what seH*ms at 
pres(*nt the*, best hope for the systematic inversie)n of large matrices. Since 
e)nly e)ne of the Mallock machines is in existence (it is in Cambrielge*, Fngland), 
se)me adaptation e)f the* ])e)olittle* or relateel methods will ordinarily be* useul. 
By taking aelvantage* e)f the possibilitie*s in me)ele*rn calculating machines e)f aev 
cumulating pre:)elue;ts te) reduce the amenint of writing rcepiired in the l)e)e)little^ 
methoel, exe*c*e*elingly compact and efficient methe)els havej been eleve*lopeel for 
solving systems of linear e*eiuations and for evaluating inverse matrices by J)wye*r 
[7, 8, 9], who utilizeel the earlier work of Waugh, Kurtz, Horst, Dunlap and Cure^- 
ton cited by him, and for solving systems of linear eepiations, by Grout [Gj. 
Dwyer gives valuable bibliographies. 

By some of these methods, or from a general knowledge of the subject, one 
may well obtain approximate solutions correct to a very small number of decimal 
places, and then by iteration get as many more places as are required, with 
labor far less than would be necessary to cany through from the beginning the 
requisite number of places. Further applications of iterative methods arise 
when a least-square solution is to be revised, either on account of new observa- 
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tions or because' of eiTors discovere'd in the original observations or calculations. 
Jhit howe'ver a least-sciuarc calculation or tlu' evaluation of any inverse matrix 
begins, and whatever inte'rmediate steps are taken, it seems advisable^ to t(U’- 
minate it with the method of Se'ction 7. This combines a check on the previous 
work, at a labor cost equivalent merely to substituting the values found for tlie^ 
unknowns into the equations, with an improvemi'iit in ac'curacy and a useful 
limit of error for the unknowns. 

In the inversion of large matric('s tlu're are important possibilities in the 
propc'i ties of partitioning. For exami)le, a scpiare matrix of 2p rows may l)e 
partioru'd into four squai*e matrices a, />, c, d, of p rows, and written 

a b 

_c (l_ . 

If this is multiplied on the right by another partitioned sipiarc' matrix of 2p 
rows which may be written 

"yl C" 

B /)J, 

where /t, /i, C, D are s(juar(' /;-rowx‘d matric(‘s, th(' product 

“'a.4 + bB aC + hlf 

__cd + (IB cC + (U)_ 

is identical with th(' n'sult of i)artitioning the product of the two original 2p- 
rowed matri(;es. If tlu' second is tin* inverse' of the first, this j)roduct is the 
identical matrix. (V)iis('(iuently, if the first matrix is givem, we have^ for de- 
termining its inverse the' four matrix equations in yl, R, C, 7), 

(lA -i-hB = \ aC + bD = 0 

cA+(tB = 0 cC +d/4= 1, 

where' I stanels for the' ielentical matrix e)f p re)ws anel 0 for the' p-rowe'el matrix 
e^e)nsisting entirely of zere)s. Tlu'se' e'ejuatie)ns may l)e' solve'el just as in ele'me'iitaiy 
alge'bra except that care' must be' use'el to pe'rfe)rm matrix multiplicatienis in ce)r- 
re'e't orelea*. Thus 

A - (a - h(l B = -(r\,A 

1) = (d - arV>) ' (J - -a %D. 

These fe)rmulaei e'all fe)r inversion of four p-roweel matrices, name4y d, a — 
tt, and d — ca'^h. Without changing the number of such inversions we may 
choe)se alternative sets of matrices to invert, with econemy of labor in certain 
cases. For examj)le', if b is e'asy to invert, we may use fe)r 1) the expression 

D = b~^"aAbd~\ 
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The formulae and numerical work are further 8implifi(‘(l if |^iv('n matrix is 
symmetric. Other modes of partitioning are also possihh*, and may he valuable 
in various kinds of nunKu-ical work. Another method of obtaining the inverse 
of a matrix by partitioning is given by Frazer, Duncan and Collar [12, pp. 112- 
118], who also give an account of general properties of partitioned matrices. 
In the treatment of relations bi'tweim two or more sets of variates [18, 31], 
partitioned matric(\s appear. 

The most efficient method of calculation of a function of a matrix will depend 
in part on what else is to be calculated. For (example, if th(' lab'iit roots and 
vectors are lu^eded for any r(‘ason as well as the iiivcu-sc' of a matrix, it, is bett(‘r 
to calculate the former first, and th(‘n the d(‘t('rmination of the inverse mati’ix 
becomes a trivial task; but if th(i latent roots and \'(‘ctors arc' not needed for somi' 
other purpose it is usually better not to cal(‘ulate them but to use a morc^ direct 
method to obtain the inv('rse. If in addition to th(‘ inverse the determinant is 
wanted, or many cons(H*utive powers of a matrix, or if a matrix-multiplying 
machine considerably s})(‘(‘di(4- than present pi'oeedures lx‘com(‘s available, a 
mi^thod [3] bas(Hl on the Cayley-Hamilton theorem that a matrix satisfi(*s its 
own charact(M’istic ecpiation may be recommended. 

It(U’at,ive nu'thods hav(^ what Whittakin- and llol>inson [30] call the' phrasing 
charact(Mistic that mistakes do not nec(‘ssarily spoil the whole calculation, 
but tend to b(‘ con-ected at latei* stages. This of cours(‘ does not mean that thc'n^ 
is no pcaialty for mistake's. Th('y have' an obvious U'lidency to prolong the' 
number of i‘(*pet,it iems r(*fiuii(*d, and if repeal'd at lab' stage's may actually pre^- 
ve*nt realizatie)!! of a sul)stantially ce>rrect result. A le'ss obvie)Us e‘e)nsee|uencc e)f 
mistakes near the' te'rminatie)n of an iterative e'alculation is that they tenel to 
vitiate any limits e)f e'i re)r (hat may be elerivenl, inclueling the)se' that will be founel 
bele)W. Creat care should be used to insure' accurate calculatie)n especially in 
the last stages of any iterative process. 

Te) iiisurei aceairacy e've'ii befe)rei the last stage's, anel the're'fe)re e'fficiene^y, a 
che?ck ce)lumn consisting of the sums of the elements in the rows e)f matricc'S 
multiplie'el anel adde'd te)ge'ther may well be e^arrie^l ale)ng. In multiplying two 
matrie*e's e)nly the' check column e)f the secemel fae*te)r is use'el; it is multiplieMl by 
e'ach re)W of tlie first fae*te)r to obtain the check column for the i3re)eluct. A 
e*e)mpute'r thoroughly e'xpe'rie'iice'd with matrix multiplication may dispense with 
the' check column at all stage's but the last of an iterative ])i*ocess, relying on the 
self-correcting pre)pe'rty of the proce'ss. 

A simple but extreme'ly valuable bit of e'epiiiRne'nt in matrix multiplmatie)n 
consists e)f t we) plain e-arels, with a re'-e'iitrant right angle e'ut out of one or both 
of them if symme^tiic matrice^s are to be multiplied. In getting the element of 
the* ith row anel ^’th column of the pre)duct, the ith row of the first facte)r and the 
yth column of the^ second she)uld be^ marked by a card beside, al)ove, or below it. 
In writing a symmetric matrix it is convenient to omit Wiv. elements below the 
principal diagonal. The^ re-entrant right angle is then utilized to mark off the 
numbers belonging to a particular row. 
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A report [13] on certain iterative methods of solving linear and other equations 
and of calculating latent roots and vectors, with engineering applications, was 
published by R. von Mises and 11. (leiringer in 1929. As part of a discussion 
of certain problems in psychology [16] the present author in 1933 dcvscribcd 
iterative processes both for solving systems of linear equations and for finding 
principal components, and later [17] showed how to accelerate convergence to 
principal components by repeatedly squaring the matrix. Further acceleration 
of convergence by other devices has been discovered by A. C. Aitken [2]. Dr. 
Geiringer has also discussed a method of solution of ccjuations involving iteration 
by small groups of unknowns |14]. The method of Kelley and Salisbury [20] 
should be noted. It has been used extensively by psychologists. Definite 
limits of error and measures of rate of convergence for this m('thod are missing. 
Certain other iterative^ methods will b(^ discussc^d in later sections. It will ap- 
pear that the most-used methods are by no means th(' best. 

Questions n^garding the probability of a matrix of covariaiUM's satisfying 
l)articular conditions of computational significance may in some cases be il- 
luminated with the help of th(» theory of the variates as a random sample of a 
larger aggregate. This theory was outlined in th(' latter part of the paper [16]. 

II. Linkaii JCquations and Inverse Matrices 

3. Accuracy of direct solution of linear equations. Thi) (luestion how many 
decimal places should Ixi retaimxl in the various stag(‘s of a k'ast-scjuarc solution 
and of other calculations involving linear (‘(luations has l)een a puzzling one. 
It has not generally been realized how rapidly errors resulting from rounding 
may accumulat(5 in the successive steps of such procedur(\s as, for example, th(‘ 
Doolittle method. In this popular algorism for solving a system of (Xjuations 

p 

(lijXj ^ (Ji (i = 1, • * • , p), 

the equivalent of successive eliminations of jti , X 2 , • • • , .Tp_i to obtain an equa- 
tion in Xp alone is accomj)lished by calculating successively 

Ut-y i = (lij - atiaij/du , {7».i = (/i “ atif/i/uu (f, J = 2, 3, • • • , p), 

then 

a»v i2 = dij.i *— ai2- 1^2/- 1 /^ 22 . 1 , 

< 7*12 = gn — d ^2 i< 72 . 1/^22 1 {i, i == 3, • • • , p), 

and so forth. Let us suppose that each of the Ut/s and r/t’s is subject to an 
error concerning which it is known only that its abvsolute value does not exceed 
€. Thus if they are given accurately to k decimal places only, we have € = 
10~^/2. Let the actual errors be represented by 6a, y and 8gi . If these are 
small an estimate of the error in gi.j may be obtained by expanding in a Taylor 
series and retaining only the linear terms: 
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The closest upper bound for this error obtainable without special assumptions 
regarding the values of the givcjii (piantities is specified by the inequality 



The a^s and arc oftem correlation coefficients. Any set of normal equations 
of least sejuares may be reduced to a form in which this is the case, and this re- 
duction has considerable merits. The various correlation coefficients arc fre- 
quently of interest in themselves, and their use in the normal equations 
practically insures that all the fiuantities appearing at any stage arc of the same 
order of magnitude. This last is a very substantial advantage, partly because 
of the check column which is customarily carried along, in which each entry is 
the sum of the otla^r entries in its row. Since the absolute value of a correlation 
coefficient is less than unity, and since becomes equal to unity, the last in- 
equality gives in this case 

I dgi.i I < 4€, 

and no closer ineciuality appears possible. In the same way we find for this 
case in which the a\s are correlation coefficients that 


I daij'.i I < 4e. 

Jhoceeding from th(‘se inc(iualiti(\s in the same way, and neglecting the fact 
that I a 22 1 I < 1 though like an it is put (npial to unity in the argument, we find 
for th(' errors in a^jn and gin the estimated upper bound IGc, with an actual 
upper bound somewhat higher unless ai 2 = 0 . t'ontinuing in the same way we 
find for Qijn (p-i) and < 7,12 .(p_i) the estimated limit of error 4^“^€, with a pos- 
sibility of a somewhat higher value up to 4^'~^c/a, where a is the determinant 
I a,y I < 1. The rapidity with which this increases with p is a caution against 
relying on th(? results of the Doolittle method or other similar elimination 
methods with any moderate number of decimal places when the number of 
equations and unknowns is at all large. Thus if p = 11 the limit of error exceeds 
a million times e, indicating that if only one decimal pla(;e is wanted in the value 
of Xp the original correlations must be utilized to at least seven decimals, even 
if we neglect the additional errors introduced by dropping decimals beyond 
those retained in the intermediate stages of the calculation. The errors ac- 
cumulate further during the back solution, so that if all the unknowns are 
wanted with one-place accuracy it is necessary to use the original correlations 
with substantially more than seven decimal places. For larger values of p the 
increase in the error limit is startling. Thus for p = 27 (the number of tests 
reported to be involved in a certain current procedure in classifying military 
personnel) the limit of error even for the first unknown evaluated is 4^®c, repre- 
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senting a loss of about 16 decimal places of accuracy, while the correlations in 
Holzinger^s 78-rowed matrix would need to be carried to no less than 46 places 
to insure even an approximate accuracy in the first decimal place of one of the 
regression coefficients in a formula derived by least squares for predicting one of 
his variates in terms of all the others. 

These high limits of error may possibly be reduced in the following ways: 
(a) a more exact study of the error might be made by means of tcims of the 
Taylor series of orders higher than the first; (b) the positive definite character 
of a correlation matrix (or other matrix of normal equations) might be utilized 
in an attempt to arrive at lower limits of error; (c) instead of considering the 
maximum possible error we might depend on some mutual cancellation of dif- 
ferent errors and content ourselves with statements in tciTns of probability. 
The compounding of different errors of rounding, which may individually b(» 
regarded as having a [)robability distribution of uniform density over a fixcnl 
range, (juickly gives rise to an almost exactly normal distribution of known 
mean and variance, so that the probability approach is attractive'. However 
the limits of error obtained in this way with, for example, a five per cent level of 
probability of a greater error, though somewhat smaller than th(^ limits asso- 
ciated with certainty, are disappointingly large. Investigations of the types 
(a) and (b) have not been made; they would apparently be very cumlxa'somc, 
and (a) might have the effect of increasing the error limits considered above 
instead of cutting them down. Use of the check column does not provide any 
safeguard against the errors of rounding appearing in the original correlations, 
though from the probability stand])oint, a carefully devised use of the check 
column may mitigate the accumulation of errors in siu^cessive stages. 

^ To control such errors reliance is often placed in a substitution of the solution 
obtained in the given ecpiations. This is not completely satisfactory, sinc(3 
under some circumstances large errors in the solution may yield only slight 
deviations of the left from the right members of the equations, and since some 
deviations must be expected in any case in which only a limited number of 
decimals is carried along. Moreov(;r this substitution, even if it reveals the 
existence of errors, does not usually make clear at once what should be done 
about them. A recalculation to a larger number of dt^cimal places is horribly 
laborious. There is her(! a distinct ne('d of using an iterative process for im- 
proving on the solution obtained, and setting definite limits for the errors. 

4. The classical iterative method. The iterative method which seems to be 
the oldest and the most used for solving systems of linear equations, and which 
may like all other methods of doing this be applied to find the inverse of a 
matrix, is that of Gauss and vSeidel. It seems also to be used in the ‘‘method 
of relaxations” [26], which has been recommended to engineers but lacks limits 
of error and measures of rate of convergence. 

This classical method, starting with any assumed values for the unknowns, 
begins by changing the value for the first unknown so as to satisfy the first 
equation; this is possible if the coefficient is different from zero. The revised 
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set of trial values is then further altered by changing the second unknown so as 
to satisfy the second equation. Then the third unknown is altered so that the 
third equation will be satisfied, and so forth. When all the unknowns have 
been thus altered the cycle may be begun again, and repeated until the differ- 
ences between consecutive values of each unknown become small enough to 
indic^ate a satisfactory convergence. The method converges if the matrix A of 
the coefficients aij is positive definite, as it is for the normal eciuations of least 
squares, and also in certain engineering applications [7, 8, 9], Moreover the 
character of being positive definite insures that each a„ differs from zero, so that 
the successive adjustments indicated are all actually possible. In the published 
discussions, proofs of convergence have sometimes been omitted, and in some 
cases (e.g. [30], Sec. 130) the proofs arc incomplete'. Even the fuller proofs 
[13] and [10] fail to give ex])licit limits for the errors in stopping at any particular 
stage. But from the discussion [16, pj) 502, 504] it is easy to see that positive 
numbers d» and k exist, with fc < 1, sucdi that the error in the 7y?th estimate of a*,- 
is less than This limit of error diminishes in gt'ometric progression with 

successive iterations; hence the number of dt'cimal places of accuracy increases 
appi’oxiinately in arithmetic progression. 4'he progression is howcv('i* irregular 
and tlie tiial valu(*s may fluctuate considerably. Numerical determination of 
limits of error does not appear to Ix' easy. Expei’icnice with tlu' iiK'thod indicat(\s 
that it is satisfa(!tory only in case a n'ally good approximation is available to 
begin with, in s[)ite of its universal convergence. 

5. An acceleration and extension of the classical iteration. This classical 
scIk'iik^ may be improved in th(^ following way if numerous cyck's of revision 
of the trial valuers are expected to be nec'ded for the recpiisitc accuracy. The 
first stc'p, consisting of replacing the trial value Xi by 

x[ = (oi — ai2X2 — • • • — aipXp)/an 

and leaving X 2 , • • ' , Xp unchaiig(id, amounts to subjecting the p + 1 variables 
Xi ) , Xi , • ’ • , Xp to the homogeneous transformation 

:rS = Xo 

•^*1 “ {OiXq “■ cii2X2 — • • • — aijt.Cp)/(in 

f 

X2 — = X2 


Xp Xp , 

wlu're the symbol :i’o , introduced for conveniencH^ in ord(U‘ to make these equations 
homogeneous, is always ecpial to unity. The matrix of the transformation. 



1 

0 

0 

0 

0 

gi/aii 

0 

— ai2/ ail 

— ttia/ an • • • 

— aip/ ai 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 
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is of course singular. If Xo denote the one-column, (p + l)-rowed matrix of 
the initial trial values, with unity at the head of the column, the column matrix 
Xi = TiXo is the result of this first operation, again with unity at the head of 
the column. The trial values obtained by the second operation appear likewise 
in the column matrix X 2 = T 2 X 1 = T 2 T 1 XQ , where 



1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

Qil<hi 

— Oil/Clti 

0 

—dial (hi 

~(htl<hi 

• • • ~(hp/di 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

1 


The result of a complete cycle of substitutions may be written Xp = 
TpTp-i • • • T 2 T 1 XQ , where the matrices Ti are of the same simple charactc^r 
illustrated by Ti and T 2 . This same result will be obtained, because of the as- 
sociative law of matrix multiplication, if we first calculate numerically the 
matrix 


T = TpTp^i . . • T2Ti 

and then Xp = TXq . (Experience shows that computers need at this point the 
caution that the matrices must be arranged in their proper order. A good pro- 
cedure is first to form T 2 T 1 , then to multiply this by Tz on the left, etc.). This 
re(iuires rather more work than the original Gauss-Scidel scheme, and therefore 
is not worth while if only one cycle of substitutions is needed. 

The advantage lies in the fact that T may readily be squared, and 7’^Xo gives 
a result equivalent to that of two full cycles of iteration by the GaussnSeidel 
method. Furthermore, may be squared to give 2’^, which may also be 
squared, and so on. Obviously k such squarings give a matrix which, when mul- 
tiplied by Xo , yields the same result as 2^ complete cycles of the original sub- 
stitutions. In terms of the number k of squarings the number of decimal places 
of accuracy tends to increase in geometrical instead of arithmetic progression. 
This modification of the classical method does not seem to have be^m published 
heretofore, though both it and the method of Section 7 have been in use by the 
author and his students since 1936. 

R. A. Fisher [11, Sec. 29] has introduced the valuable method of finding the 
inverse of a matrix A by solving together p systems, each of p equations in p 
unknowns, with the same matrix A of coefficients, but different columns of 
unknowns; these several columns of unknowns are the elements of the identical 
matrix. The technique of carrying this out by any of the methods resembling 
that of Doolittle is a simple extension involving replacement of the right-hand 
members of the equations by Vs and O^s and carrying along p such columns 
instead of one while applying exactly the same linear operations to the rows as 
in the older problem. This, like the problem of solving linear equations, has 
been elegantly adapted to efficient calculation with modern machines by Dwyer 
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[7, 8, 9]. The foregoing iterative methods may also be applied in this case, but 
the matrix T will be different for the different columns. When the given matrix 
is symmetric (as is implied by the positive definite character assumed in 
the proofs of convergence) the number of iterations required is generally cut 
down because the determination of each column determines also the elements 
of the corresponding row which lie in other columns. Iteration by groups 
[14] may well have a place here. 

An observation of A. C. Aitken’s [1] is noteworthy in connection with the 
solution of equations with a non-symmetric matrix, and with the finding of the 
inverse of such a matrix. Writing the equations in the matrix form AX = G, 
we see that the solution X = A~'^G is also the solution of the system (A'A)X = 
A'G, where A' is the transverse (also called the transpose or conjugate) of A. 
Evidently A' A and AV can be formed by direct multiplications and additions, 
without divisions. Since A 'A is symmetric, any of the methods for solving 
symmetric equations are applicable to the new system. To find the inverse of A 
we may first find the inverse of the symmetric matrix A 'A and then postmultiply 
it by A'; for (A'A)”^' = A^\ 

6. Roots, norms and convergence of matrices. The norm of a matrix A may 
be defined as the square root of the sum of the products of its elements by their 
complex conjugates, and denoted by A' (A). If A is real and a,y is the element 
in the ith row and jth column, 

(6.1) N{A} = Vx^Ti ■ 

This is the same function which Wedderburn [29, p. 125] defines as the absolute 
value of A and denotes by A with a heavy vertical bar on each side. Since it 
is rather troublesome to avoid confusing this with the determinant of A, we use 
the notation A (A), though the analogy with the ordinary absolute value of a 
quantity is very suggestive in connection with proofs of convergence and limits 
of error obtained by means of the ‘‘triangular inequalities’^ below. Rella [25] 
gives a different definition of the absolute value of the matrix as the maximum 
of the absolute values of its roots. 

The triangular inequalities, whose proof is easy with the help of the Cauchy 
inequality, are: 

(0.2) A(A + B) < A(A) + A(fi), 

(6.3) N{AB) < A(A)A(R). 

From the last it follows that for any positive integer m, 

(6.4) A(A”‘) < [A(A)]’". 

Hence if A (A) < 1, the limit of A (A”*) as m increases is zero. It then follows 
that the limit of A’” itself is zero, i.e. that each of its elements approaches zero, 
because of the definition of the norm. 

The identical matrix of p rows, which we shall denote simply by 1, has the 
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norm \/ p, while a scalar matrix k (i.e. one with the quantity k in each element 
of the principal diagonal and zeros elsewhere) has the norm k \/ p. The norm 
of a p-rowed orthogonal matrix is \/ p. 

The roots of a square matrix, also known as the latent roots or characteristic 
roots, are the values Xi , • • • , Xp of X for which the determinant obtained by 
subtracting X from each element of the principal diagonal vanishes. J^y expand- 
ing this determinant in powers of X and using a relation between roots and coeffi- 
cients of an equation, it is evident that the sum of the roots equals the sum of 
the elements in the principal diagonal. This sum is known as the trace of the 
matrix and denoted by tv (A), Thus 

(G.5) Xi + X2 + • * • + Xp = tr(A). 

From the definitions of the transverse and norm of A it is plain that 

(6.6) [N(yl)f = tr(AA') 
if A is real. 

If f{x) is any polynomial in x, f (A) is a matrix whose roots are known [29, 
p. 30] to be /(Xi), (t = 1, 2, • • • , p). In particular, the roots of A”* are X^ . 
Consequently 

(6.7) xr + xr + • • • + X^^ = tr(A'”). 

All the roots of a zero matrix are zero. But the fact that all the roots of a 
matrix arc zero does not necessarily imply that the matiix is zero; for example 
the roots of 



are both zero. But for real symmetric matrices the vanishing of all tlie roots 
does imply the vanishing of the matrix; for the sum of the squares of tlic elements 
of a symmetric matrix equals the sum of s(|uares of the roots, since A = A', 
and by (6.7), (6.6) and (6.1), 

SX? = tr(A') = iY{AA') = \N{A)f = . 

Moreover, by continuity considerations, a secpience of p-rowed symmetric 
matrices must approach zero if all the roots approach zero, and conversely. 

From this it is evident that a necessary and sufficient condition that A"'" 
approach zero as m increases, when A is symmetric, is that all the roots of A 
be less than unity. This j^rovides a sharper criteivion of convergence than the 
requirement that A^(A) < 1, which is sufficient but not necessary for conver- 
gence. The latter is however far easier to apply in most numerical work, sinc(i 
it is far easier to compute N{A) than the greatest root. Moreover it is easy to 
set an upper bound for A'' (A) in various ways, of which the crudest is to notice 
that, by (6.1), A^(A) cannot exceed 7; times the greatest absolute value of any 



MATRIX CALCULATION 


13 


element of A, Also, the test in terms of the norm is applicable to asymmetric 
as well as symmetric matrices. 

From these considerations regarding the convergence of A w(' deduce at once 
the following result. If the norm of a square matrix is less than unity, then all 
the roots are less than unity in absolute value. The converse is not true, as the 
example (6.8) shows. 

For any real square matrix A, symmetric or not, 

(6.9) Xi + X 2 + • • * + Xp < [A(A)f. 

To prove this, we observe first that 2a*yayi < a] j + a )^ , and consequently 
tr(A*'^) < tr(AA'). From (6.7) and (6.6) we then have < tr(AA') = 
[A(A)]^ This reasoning shows incidentally that 2X! is real, though the indi- 
vidual roots may be complex. 

Not only for investigating convergence, but also in the important but neg- 
lected problems of setting definite limits of error after a finite number of steps, 
the norm is an extremely useful function. If a matrix is to be computed with 
such accuracy that the error in each element is less than 6, and A is the matrix 
of errors, the requisite accuracy will according to (6.1) be attained when N(A) 
< d. The definition and theorems regarding the noim are valid without any 
restriction to square matrices, for which alone the roots are defined. For 
example, we may use the norm to derive an inecpiality (‘onccTning the solution 
of the system of p liiK^ar etjuations 

^dijXj ~ (7* » 

which may be written in matrix form AX = G, where A is a square matrix and 
X and G are matrices each of one column and p rows. From (6.3) we find 
N{G) < N(A)N(X), whence 

N{X) > N{G)/N{A). 

We shall now deduce a icsult which seems to 1x5 new to matrix theory and 
which we shall later apply to find limits of error. If A is any matrix such that 
1 — A is non-singular the identity 

(1 - A) ‘ = i + A + +■■■ + A""-' + A”‘{1 - /ir* 

holds, and may be demonstrated exactly as if A were an ordinary scalar quantity. 
Suppose that NiA) < k < 1. Taking the norm and using (0.2), (6.3) and 
(6.4), wc have 

N[(\ - yl)"'] < + k + k^ ■■• + k”-^' + k-^Niil - ^)"']. 

Since A: < 1 we may solve for iV[(l — d)”']. Summing the geometric progres- 
sion, wc obtain: 

1/2 __ ^ 
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This holds for every positive integral value of m, and therefore in the limit when 
m becomes infinite. Thus we find that 

(6.10) N[(l - A)-^] < p'/* - 1 + 

whenever N(A) < k < 1. 

7. An efficient inversion procedure. Let Co be an approximation to the 
inverse of a matrix A, and consider the following sequence of operations. Cal- 
culate 

(7.1) Cl = Co(2 - AC,), 
and then in turn Ci,Ca, • • • where 

(7.2) C„+i = C„(2 - AC„). 

Let us inquire as to the conditions under which the sequence of matrices C„ 
converges to A~^, the maximum error that may be committed in stopping at 
any stage, and the rate of convergence. Suppose that Co is a good enough 
approximation to A~^ to make the roots of the matrix 

(7.3) D = 1- AC, 

all less than unity in absolute value. Then increasing powers of D approach 
zero, and the convergence of Cm to A~^ will follow from the relation 

(7.4) Cm = ^"’(1 - 

which will now be proved by mathematical induction. From (7.1) and (7.3), 

Cl = A-‘(ACo)(l +D) = A~\l - D){1 +D) ^ A-\l - D'), 

so that (7.4) is verified for m = 1. Now assume (7.4) for a particular value 
of m, and substitute it in (7.2). This gives 

C„+i = A~\l - + Z)'”) = A~\l - !)*"•'■'), 

which being of the same form as (7.4) completes the induction. 

If N{D) < k < 1 the roots of D arc all less than unity in absolute value, as 
shown in Sec. 6, and the foregoing result holds. Assuming this to be true we 

now derive an upper bound for the error in Cm in terms of k and A (Co). Ac- 

cording to (7.3), 

A'* = Co(l - D)-\ 

Hence, by (7.4), 

Cm - A"* = -A-'Z)^” = -Co(l - Z))-'Z)*’". 

Therefore, by (6.3), (6.4) and (6.10), 

(7.6) NiCm - A-‘) < N{C,)k^” - 1 -b . 
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This sets an upper bound for the difference between each element of Cm and the 
corresponding element of A~''. A slightly looser but simpler limit may be 
obtained from this in tei-ms of the greatest absolute value c of any element of 
Co . Since A(C„) < cp, 

(7.6) N{Cm - A-*) < k^”cp{p''^- - 1 + ^ 

The great value of this method, whenever a good enough initial approximation 
is available to make N{D) less than unity, is that the number of decimal places 
of sure accuracy increases in geometric progression, rather than in arithmetic 
progression as with the usual methods. O^nsequcntly this me^thod will always 
be the most efficient if a sufficiently large number of decimal places is required. 
Moreover, a limit can be set in advance for the number of iterations that will be 
required in order to insure any required degree of accuracy. If certjiinty of 
correctness in the sth decimal place is required we may choose rn so that the 
right-hand member of (7.G) is less than 10~V2. In terms of logarithms to the 
base 10 the number of decimal places whose accuracy is assured by in iterations 
is thus at least 

(7.7) 2™ I log ^ I - log 2 - log cp[p^'‘^ - 1 + (I - A)-’]. 

These limits of error can be bettered after some iterations have actually been 
made. When CV becomes available we may calculate kr = -V(l ~ ACr), which 
may be used in place of k in the formulae just derived if m is replaced by m — r, 
and is generally enough smaller than k to make a marked improvement. 

The elements of the matrix of errors will actually, of course, be smaller than 
the norm of this matrix in every practical case, in a ratio fluctuating about 
The limits obtained by our formulae can be reached only in case the entire error 
of the matrix (\n is concentrated in one element, a very unlikely event. Thus 
the limits given above will usually be quite conservative. 

As the iteration proceeds the elements of the matrix Dm = 1 — ACm = D^"^ 
will diminish rapidly in case of convergence. For this reason it may sometimes 
be better to calculate Cm^i not directly from (7.2), but from the formula 

(7.8) Cm^l ^ Cm + CmDm 

in which the last teim can be regarded as a correction of Cm which will often be 
very small. This method, however, lacks the self-checking feature, so that 
its use at the final stage is dubious. 

This iterative process has been noticed previously [12, p. 120], but without a 
limit of error or observ^ation of the geometric progression in the number of 
accurate digits. 

If the initial approximation is not good enough to make N{D) < 1, it may 
be improved by other methods, such as those of Sections 4 and 5, to the point 
at which this more rapid method becomes applicable. But in some cases (e.g. 
the second example of §8) the method converges even though N(D) > 1, as 
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may be demonstrated at a later stage at which the norm of the matrix corre- 
sponding to D becomes numerically less than unity. 

For the mass of least-square and other problems in which the inverse of a 
matrix is needed, the best procedure appears to begin with one of the methods 
described by Dwyer [7, 8, 9], carried to a small number of decimal places, and 
then to calculate D from (7.3), a step equivalent to substituting the approximate 
solution obtained into the equations. It may then be evident at a glance that 
the norm of D is so small that the method of the present section will converge 
rapidly to give as many more places as desired. If N{D) is too large for this, 
and if gross errors have been eliminated, there is a choice between recalculation 
from the beginning, the classical iterative process, and the acceleration of this 
process by matrix-squaring, with perhaps some iteration by small groups. The 
choice will depend partly on how much the elements of D need to be reduced. 
The clafSsical iteration (or sometimes the process of this section) is appropriate 
for correcting a slight excess of N{D) over unity, its matrix-squaring extension 
for larger alterations. 

Let Eq be the error in Co , so that Co = A~^ + . Then by (7.1), 

Cl = (A-^ + Eo){l - AEo) = ~ EoAEo . 

If El is the error in Ci , so that Ci = A"^ + Ei , we thus have 

El = — EqAEo . 

If A is symmetric, we naturally take Co as a symmetric matrix, and this will 
cause Eo , Ci , and Ei also to be symmetric. If also A is positive definite, it will 
follow from the last equation that Ei is negative definite', or negative semi- 
definite. Consequently the diagonal elements of Ci tend to underestimate the 
‘corresponding elements of A~^, and never exceed them. Furthermore, the 
value of a quadratic form whose matrix is A~^ will be at least as great as the 
estimate of it based on Ci . The squares, both of the multiple correlation 
coefficient and the generalized Student ratio [15], can be expressed as such 
quadratic forms. Hence both these statistics are slightly underestimated when 
Cl is used in place of the true matrix of coefficients. Later approximations Cm 
do not change the signs of these biases, though they make their magnitudes 
approach zero in case the conditions for convergence are satisfied, and definite 
limits converging to zero are easily found for them in such cases from the results 
above. 


8. Illustrations and further comments. We shall indicate symmetric matrices 
by writing only the elements on and above the principal diagonals. 

To illustrate various methods Dwyer [7] has evaluated the inverse of 


A = 


.6 

.4 

.2 

1.0 


1.0 


.4 

1.0 


.5 

.3 

1.0 
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2.0710 - .1913 - .7759 -1.0109" 

1.2842 - .2180 - .3552 

1.3989 .2732 ’ 

1.6940_ 

If the accuracy of the calculation had been only .such as to insure eorn'ctness 
in the first decimal place the approximation to A~' would have been 

"2.1 - .2 - .8 -l.O" 

^ _ 1.3 - .2 - .4 

1.4 .3 • 

l-7_ 

It is easy by mental arithmetic alone, without the use of a machine or side 
calculations, to see that 


-.02 

.02 

0 

-.01 

0 

0 

-.02 

.03 

.01 

-.01 

0 

-.02 

-.02 

.04 

-.02 

0 


and further that N(D) = ■s/l)052 = .072. This is so much less than unity 
that the iteration process of §7 will converge rapidly. As a matter of fact, 
without determining the .sum of the squares of the elements of D we could 
have ob.served at a glance that N(D) must be le.ss than four times the greatest 
absolute value of an element, and thus have a value less than .10. In the same 
way N(Co) is seen to be le.ss than 8.4; actually it equals 3.8588. The latter 
value, with k = .072, p = 4, sukstituted in (7.5) gives for the norms of the 
successive error matrices = Cm — 

NiEo) < 8.03* = .578, 

NiEi) < 8.03*^ = .0414, 

NiEi) < 8.03*' = .000216, 

NiEi) < 8.03*® = .000 000 0058. 


This promises merely that after one application of the iterative process the 
results will be accurate to one decimal place, which we know already but might 
not have known for sure in such a case; that a .second iteration will give results 
accurate to three places, and that a third will give results accurate to about 
eight places. These estimates will however be improved after actually com- 
puting Cl . This may well be donii by (7.1 ) if a machine is available; otherwise, 
and almost as easily, by (7.8) we obtain 


2.070 



- .770 -1.011 

- .218 - .355 

1.398 .274 

1.692 


.190 

1.282 
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and N(Ci) = 3.81G3. (We have now passed beyond the stage of easy mental 
calculation, but might alternatively use the easy upper bound 8.28 for N{Ci), 
obtained as before.) We shall use this value instead of N(Co) in (7.5) and at 
the same time use for k the value of N{Di), where 

Di = 1 - ACi ^ i - ^(7o(l + Z)) = 1 - (1 - Z))(l + D) = D\ 

This is most easily found from Z>, from which it may be written down directly 
by mental calculation: 


Z)i = 10"" X 


"6 -8 
-8 14 

2 -6 
2 -2 


-2 8 
-6 4 

6 -4 

-8 18 


The norm of Di is seen by the crude method to be less than .0072, and is actually 
.003212. Taking the latter value for k we have, similarly to (7.5), 

N{E„) < NiCi)k^’"~' X 2.00323 = 7.645A:‘'’"‘‘. 


Thus, 


N(Ei) < .0246, 

N{Ei) < .000 0789, 
NiEi) < .000 000 000 8 . 


The reduction in these limits of error is due to the difference between [W(Z))]* = 
.0052 and N{D^) = .003212. 

Using Ci = Cl + CiDi we obtain; 


r 2.0710366 



- .1912542 
1 .2841486 


- .7759568 -1.0109294 

- .2185780 - .3551910 

1.3989056 .2732260 

1.6939852 


From this we calculate 


112 

-164 

- 40 

168 

-164 

288 

-136 

88 

64 

-128 

100 

-104 

48 

- 32 

-184 

364 


agreeing with the value obtained from the formula Di = D\ , and finally Ca = 
Ca(l + Di) = 


2.071 038 458 - .191 256 831 
1.284 153 005 


- .775 956284 

- .218 579 235 
1.398 907 104 


- 1 .010 928 962 
- .355191257 
.273 224 045 
1.693 989 071 


which as shown above is correct to at least eight decimal places, and doubtless 
more, in each element. The estimate of A~^ obtained by Dwyer by several 



MATRIX CALCULATION 


19 


direct methods to four places is corroborated by this result excepting for a 
slight error in the element in his firet row and third column. 

(ii) Suppose that the approximation in the foregoing example had been even 
cnider, with determination of the elements of A~^ only to the nearest integer. 
This would give 


Co 


2 0 - 1-1 

1 0 0 
1 0 ’ 

2 



The .sum of the .sejuares of the elements of D is 1.51, so that the norm is greater 
than unity, and it is not clear at this stage whether the iterative process we have 
been using will converge or not. Rut upon computing 



".15 

-.11 

.18 

.27" 

Z)' = 

.01 

.17 

-.10 

.19 

.15 

-.27 

.30 

.09 


_.12 

.04 

0 

.30_ 


we find that = -s/.()093 = .7800, and .since this is le.s.s than unity we are 

as.sured that the process will converge. We may write immediately, without 
use of a machine or written side calculation: 


2.0 


Cl = Co + CoD = 


- .1 - .9 

1.0 .1 
1.0 


- 1.1 
- .4 
.3 

1.4, 


Utilizing the value of already determined, we readilj’^ find 


2.032 


Cj = Cl + CiD' = 


- .138 - .848 -1.056 
1.138 -0.42 - .372 

1.182 .274 

1.558_ 


From this point on a machine is needed for efficiency. The next step is to 
calculate D*, either by squaring D‘ or by the formula = 1 — ACz ; both 
methods may be used as a check. The result is: 


808 

- 730 

1094 

1330 

20 

786 

- 830 

890 

846 

-1560 

1998 

540 

616 

80 

152 

1690 



20 


HAROLD HOTELLING 


We may now consider the accuracy of further approximations, inserting in 
(7.5) NiCi) = "s/ 13.38^72 = 3.659 in place of N{Co), m — 2 for m, and k = 
N{D*) = .4119. Thus 

N{Ei) < (9.8807)(.4119) = 4.0699 
N(Ei) < (9.8807) (.41 19)* = 1.6764 
N{Ei) < (9.8807) (.41 19)' = .2844 
NiEi) < (9.8807)(.4119)* = .00819 
N(E») < (9.8807)(.4119)'* = .000 006 79. 


Because of the roughness of the initial approximation in this case the con- 
vergence is rather slow at first, but later it is much accelerated. So far as the 
limits found above show, five iterations are necessary to be sure of even approxi- 
mate two-place accuracy in the results (somewhat better limits could be ob- 
tained after actually calculating C 2 , still better ones from Cz , etc.), but the 
sixth iteration gives results sure to be accurate nearly to five places. Perhaps 
the best treatment of a numerical case of this kind is to work out the solution 
by Dwyer’s method to two, three or four places, and then to apply the iterative 
process once, and as many more times as necessary to obtain the required 
accuracy. 

The final step should, for the sake of checking, be a calculation of from 
Cm (2 — ACm)y rather than from Cm + 

Upon observing that N{D) > 1 we might have used the Seidel process to 
improve each row of Co . This process is however extremely slow, and in the 
present example is markedly inferior to that used above. 

(iii) If we start from the result which Dwyer gives to four decimal places as 
Co , we obtain 




We find N{Cq) = 3.8188, and putting k = N{D) = .00085 we have from (7.5), 
N{End < 3.8188 (.00085)'" (2.00085) < (7.6408) (.00085)'". 


Thus N{Ei) < .000 0055, 


N{E2) < .000 000 000 0004. 


9. Certain other methods of successive approximation. A class of methods 
for solving linear equations, which may be extended to find the inverse of a 
matrix, is given by Frazer, Duncan and Collar [12, pp. 132-133], generalizing 
a method of J. Morris. In this method the matrix A of the coefficients in the 
linear equations, or the matrix to be inverted, is written as the sum of an easily 
inverted matrix 7, for 6'xample a diagonal or triangular matrix, and another 
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matrix W. Then 

= (1 + = (1 - 

where / = — V~'W. If the latent roots of / are all less than unity in absolute 
value, and a fortiori if A^(/) < 1, the series 

1 +/+/ +/ + ••• = (1 -/r‘ 

converges. To solve the equations AX = (?, where A"” and G are column vec- 
tors (i.c. matrices of one column) is to determine 

A' = A'^O = (1 - /)“V/, 

where H — The method of Frazer, Duncan and Collar is to calculate 

the successive vectors 

Xo = //, Xi = +/Xo, Xs = +/Xi, . . = /7+/X,>i, •••. 

It is clear that 

A% = (1 +/+/ + ••• +m. 

The error in Xr is therefore the vector 

Er = r\i - rr"H. 

We may ascertain a limit for the errors \i N(J) < k < \. Indeed, by (6.3) 
and (G.IO), 

N{Er) < - 1 + Ar(//), 

wh(?re p is the number of unknowns; and no individual unknown will have an 
error greater than N{Er), 

Convergence of this method, if existent, may be accelerated by matrix- 
squaring. Indeed, upon calculating in turn /^, /^, f, • • • by rei)eated scpiar- 
ings, we need only to work with the sequence 

Xo - II, Xi = (1 + /)Xo , Xa = (1 + f)Xi , 

Xt = (1 +/)X3 , Xi5 = (1 +/)X7 , • • * , 

omitting the intermediate approximations. This will be worth while for solving 
a single set of equations only in case such great accuracy is required as to de- 
mand the use of rather high powers of /. Each squaring of / consists of the 
formation of sums of products, so that determination of, say, X31 by this 
method requires such sums after / has been determined, in addition to the 
5p involved in finding Xi , X3 , X7 , X^ , X31 after the squarings. By the 
method of Frazer, Duncan and Collar the corresponding number of sums of 
products would be Sip. Since 4p^ + 5p < Sip only in case p < 6, it appears 
that the matrix-squaring is justified only for six or fewer unknowns unless a 
larger number of terms is required. Furthermore, increasingly high powers of 
a matrix, to be useful, need usually to be expressed with more and more signifi- 
cant digits. 
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If more than one system of equations with the same matrix A is to be solved, 
these methods have the advantage that the same matrix / can be used for all 
the vectors G of right-hand members. In such cases the value of matrix-squar- 
ing is enhanced in comparison with that in which only a single system of eciua- 
tions is to be solved. Determination of A is equivalent to solving p such 
systems in which the several column vectors G together constitute the identical 
matrix. If more than p of these systems of equations are to be solved it is best 
to find A~^ and then form the various solutions A~^G from the columns G of 
right-hand members. 

It is worth noticing that the matrices 1 + /, 1 + /“, etc., are commutative, as 
are all rational functions of a single matrix. In difficult cases this may occa- 
sionally provide a useful check. 

This method differs from the other iterative methods with which we are 
concerned in that errors of calculation are not automatically corrected by it. 
This is a serious disadvantage, especially for the inexperienced computc'r, and 
makes desirable the careful maintenance of a check column. On the othei- hand, 
it does not require any preliminary knowledge of the solution. Indeed, it should 
be classified rather with the direct than with the iterative procedures on this 
account. 

The critical element in determining the success of this method is the possi- 
bility or impossiV)ility of finding suitable matrices V and W, such that V~~^ can 
be calculated easily, and such that the elements of / = — W are suffich'ntly 
small to make the roots all numerically less than unity. Morris uses for V the 
matrix derived from A by replacing all the elements above the principal diagonal 
by zeros. This insures that the corresponding positions in V~^ are also occupied 
by zeros. The other elements of are then determined fairly easily. If the 
nofi-diagonal elements of A, which appear in IF, are sufficiently small, this fact 
will insure small enough elements in / to make convergence rapid. 

A second method, given by Frazer, J3unean and Collar, chooses for V a 
diagonal matrix (one having only zero elements except in the principal diagonal), 
or simply the unit matrix. This choice reduces the labor of inversion to a 
minimum. Successful convergence will take place when the non-diagonal 
elements of A are sufficiently small in comparison with those in the diagonal, 
if V is taken as the diagonal matrix containing the diagonal elements of A, 

A third method which may be useful in certain cases, particularly when some 
but not all of the unknowns are required, is the following. Let A be partitioned: 



where a and d are square submatrices which, being of lower order than A, are 
more easily inverted. Let V and IF be the correspondingly partitioned matrices 


r ^ I 


F = 


a I 0 
L 0 I d j 


IF = 


I “1 


0 b 


I ^ J 
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Putting s = a ^b, t = d~^c, we have: 


"0 1 8 “ 

, / = 

"s<! 0" 

~”i > 


“(80*J 

0 “ 

lo. 


_0\ts _ 


_'o'l 



If only the first q of the p unknowns are required, a and b may be taken as 
matrices of q rows. If Gi and Hi coasist respectively of the first q rows of G 
and H, and if G 2 and consist of the remaining rows, then Hi = oT^Gi and 
/fj = dT^Gt . Then, in case of convergence, the first q rows of the solution 
are given by 

(1 + sO[l + (s<)1[l + (sO'J ■•■Ml- s(l + te)[l + (fe)tl + (fs)*] -Ih. 

Convergence to the correct values is assured here if the norm of any power of 
st is less than unity, as is true if and only if the absolute values of all the roots of 
st are less than unity. This is easily seen to be true, since as m increases 

lim (ts)*" = ^[lim 

10. A simple iterative method of solving equations. An entirely different 
method, whose convergence is independent of the initial trial values, is the 
following. To solve for the column vector X the equation AX = C?, we may 
start with an arbitrary column of trial values X'o and a scalar constant //, and 
then for m = 1, 2, • • • calculate Xm from 

X., = hO + (I ~ hA)Xm ^, . 

If Xm is equal to Xm-i it is obviously the desired solution. Otherwise there 
is an error 

X^ - X = (hCr - X) + (I - hA)Xm-i = {liA -- 1)X + (1 - hA)Xm-i 

= (1 - liA)iXm-i - X) = ... = (1 - hAnXo - X). ■ 

This converges to zero as m increases provided the latent roots of 1 — hA are 
all loss than unity in absolute value. If A has only real roots this is equivalent 
to reciuiring that they all be between 0 and 2/h, In particular, if A is a correla- 
tion matrix, its roots are all real and positive. Since their sum = tr{A) = p, 
where p is the number of rows, all roots of A lie between 0 and p. Conse- 
quently the process will converge in this ca.se if 0 < h < 2/p, It is desirable, in 
order to make the error diminish as fast as possible, to take h as large as is con- 
sistent with convergence. In some cases a lower limit than p will be known for 
the greatest root of A, and then a smarHer value than 2/p can be taken for h. 
A limit of error is obviously set by ’ 

N(Xm - X) < [X(l - hA)rN{Xo - X). 

This method can of course be applied to find the inverse matrix. 

It can also be accelerated by matrix-squaring. If we put D = 1 — hA we 
have for example. 
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Zs = (1 + D)(l + 2)*)(1 + D^)hG + i)% . 

The last term will approach zero in case of convergence, and may be omitted 
in this type of calculation. 

Thus accelerated, the method gives decimal places of accuracy increasing in 
geometric instead of arithmetic progression, and is remarkably simple and 
straightforward. It is at its best when the roots of A are known to be closely 
clustered about unity. A criterion of this is that 2(X< — X)* shall have a small 
value, where X is the mean of the p roots Xi . This sum of squares equals 

SX< — p for a correlation matrix A, and 2X< = tr{A^) = 22o<y = p + 2 ^ a*,- , 

i<i 

so that 

Z(X<-X)* = 2E4. 

i<i 

Smallness of this quantity is favorable not only to this iterative method but 
also to those of §§4 and 5. 

11. Use of the characteristic equation for inversion and for finding deter- 
minants. A method differing greatly from the others is based on the Cayley- 
Hamilton theorem that every matrix satisfies its own characteristic equation 
[29, p. 23; 4, p. 296]. This is the equation 

dll — X fll2 * • * dip 

0-21 U 22 — X * • • d2p 

dpi d/p2 • • * dpp — X 

= ep - ep_iX + Cp_2X' - . . . + + (_)V = 0, 

where Cr (r = 1, 2, • • • , p) is the sum of the products r at a time of the roots, 
and also equals the sum of the r-rowed principal minors of the matrix A. Sub- 
stituting- A for X, which by the Cayley-Hamilton theorem is legitimate, multi- 
plying by A“^ and transposing a term, yields 

(11.1) CpA-' = ep_i - Cp_,A + Cp-sA* + {-Ye^A^^ + {-Y+^A^\ 

This equation provides a direct method of calculating as soon as the ele- 
mentary symmetric functions Cr of the roots of /(X) = 0 have been evaluated. 
This evaluation may be accomplished by means of Newton’s identities [4, p. 243] 
connecting the elementary symmetric functions with the power-sums. If Sr 
is the sum of the rth powers of the roots, these formulae give: 

«i = Si 

Si-USlSi — Si) 

— CiSj -h sj) 

Sp = — (Cp-iSi — ep_2S2 -f- • • • ± Sp). 

V 
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The procedure is to calculate in turn A\ A\ • • • , then to obtain the s's 
from the diagonals of thcvse matrices, since Sr = ir(A'‘), then to obtain the ele- 
mentary symmetric functions Ci , • • • , Cp-i of the roots from Newton’s formulae, 
and to substitute these in the right-hand member of (11.1). It is then only 
necessary to find and divide by Cp , which equals the determinant of A. For 
this, and for checking the calculations, there is a choice of methods. We may 
find the diagonal of without troubling to compute the whole of this matrix, 
from the product AA^~^ and also, to provide a comprehensive check, from 
A^~'^A or possibly from the product of two powers of A of exponents approxi- 
mating p/2. The sum of these diagonal elements of is Sp , which may be 
substituted in the last of the Newton formulae above with the quantities pre- 
viously found to give Cp . An alternative method is to multiply A by its adjoint 
CpA“\ which is computed by (11.1), to obtain the determinant Cp . 

The total number of multiplications, divisions, and additions is distinctly 
greater by this method than by efficient direct methods such as that of Dwyer 
[7, 9]. On the other hand, this method is straightforward and easily checked; 
the divisions involved are of the simplest character, consisting only of the 
divisions by 2, 3, • • • , p in Newton’s formulae and of the final division of the 
adjoint matrix by one number; and for large matrices it is ideally adapted for 
matrix multiplication by means of punched cards. A further very important 
advantage of this characteristic function method is that it yields considerable 
additional information as a by-product. Not only the determinant of the 
matrix but the sums 6v of the principal minors of each order r are determined. 
Moreover the characteristic equation, whose coefficients would be exceedingly 
difficult to compute directly from definitions for a large matrix, is by this method 
made available for the study of the latent roots, which have great interest in 
themselves for numerous purposes. 

The characteristic function method is applicable whether A is symmetric or 
not. If it is symmetric, the same is true of each of the other matrices appearing 
in the calculation, so that it is necessary to write only about half the elements. 

An illustration using a symmetric matrix has been given by M. D. Bingham 
[3J. In the illustration below the matrix is not symmetric and has complex 
double roots and non-linear elementary divisors, so that evaluation of the roots 
by iterative methods, though possible, would be very slow and laborious, as 
shown by Aitken [2]. This is indeed the same example used by Aitken in this 
discussion. But it should be noted that the associated latent vectors, which 
are determined along with the roots in the iterative processes, require the 
solution of sets of p — 1 linear equations if the roots are found directly by solving 
the characteristic equation. 


15 

11 

C 

- 9 

-15 

1 

3 

9 

- 3 

- 8 

7 

6 

6 

- 3 

-11 

7 

7 

5 

- 3 

-11 

17 

12 

5 

-10 

-16 


Let A = 
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Then 




r -40 - 

9 

105 - 

9 -40" 


■ -617 

-380 

64 499 256" 

-76 -43 

32 

44 23 


-260 

-189 

-316 355 280 

-55 -22 

62 

20 -10 

, = 

-443 

-279 

-106 415 259 

-61 -25 

65 

20-7 


-464 

-300 

-136 439 292 

L-40 - 

9 110 - 

-14 -40 _ 


_-617 

-385 

69 499 256 _ 



-1342 

-978 - 

-2963 

2444 

2006 " 




944 

522 - 

-1982 - 

• 10 

503 


(Ay = 

- 

- 358 

-333 - 

-2435 

1307 

1334 

= AA^ (check). 


- 

- 175 

-243 - 

-2645 

1247 

1355 




-1312 

-963 - 

-2978 

2444 

1991 _ 



= -217, S 4 = “17. 


From the diagonals of these matrices, 

Si ~ 5, S 2 — 41, S 3 

Calculating the sum of the diagonal elements only of (on a machine, without 
listing them separately) from AA^ and also, as a check, from A^A^ we find ss = 
3185. Newton’s formulae then give 


Cl = 5, C 2 = 33, Cz = 51, C 4 = 135, Cs = —225, 
the last value being that of the determinant of ^1. We readily find from (11.1), 

-207 64 -124 111 171'“ 

-315 30 195 -180 270 

-315 30 - 30 45 270 . 

-225 75 - 75 0 225 

-414 53 52 - 3 342^ 

So far, all results by this method arc exact, but the division by 225 introduces 
recurring decimals and therefore a limited validity for the form 


.9200 

-.2844 

.5511 

-.4933 

- .7600 

1.4000 

-.1333 

-.8667 

.8000 

- 1.2000 

1.4000 

-.1333 

.1333 

-.2000 

- 1.2000 

1.0000 

-.3333 

.3333 

0 

- 1.0000 

1.8400 

-.2356 

-.2311 

.0133 

-1.5200 


The characteristic equation 

/(X) = x' - 5X' + 33X' - 51X' + 135X + 225 = 0 
may in this case be solved readily, since 

f(X) = (X + l)(x^ - 3X + 15f. 

III. Latent Roots and Vectors 

12. Direct and iterative methods. If the latent roots but not the latent 
vectors of a matrix are desired, as for example in a preliminary study of vibra- 
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tions in machinery being designed, where the important question is whether any 
root has a positive real part, it is only necessary to find the characteristic equation 
and to work with it by the methods of the theory of equations. The coefficients 
in the characteristic equation are the sums of the r-rowcd principal minors 
(r = 1,2, • • • , p), and arc expeditiously found directly from this definition for 
matrices of four or fewer rows. For large matrices, however, the calculation of 
so many large overlapping determinants is wasteful of eflfort, since many vir- 
tually equivalent calculations must be done repeatedly. Indeed, calculation by 
determinants in a great many situations, including the solution of linear equa- 
tions, is open to this objection. The methods of §11 yield the characteristic 
function in a manner which, for large matrices, appears to be the best available, 
excepting perhaps the new method of Sam\ielson [25a]. 

When, as is commonly the case, the latent vectors are desired, a straight- 
forward calculation directly from the definitions would require not only setting 
up and solving the characteristic equation, but also the solution, in the case of 
each root, of the set of linear equations in p unknowns whose matrix is obtained 
from the characteristic matrix by substituting the particular root for X. It is 
this solution of linear equations that aggravates greatly the computational 
labor when direct methods are used. 

An ingenious method has been used by R. A. Fisher [11, pp. 299 ff.j. Starting 
with a four-rowed determinant whose elements are linear functions of an un- 
known 6, Fishei* calculates the value of the determinant for selected values of dy 
and then by interpolation using divided differences finds the largest value of S 
making the determinant zero. The point of the divided difference method 
is that it avoids the direct calculation of the determinant for more than a few 
values of replacing it essentially by calculation of the fourth-degree poly- 
nomial in 6 from its differences and using the fact that the fourth divided dif- 
ferences are constant. The linear equations are then solved in a direct manner. 
If applied to large matrices this would be very laborious, but it compares fa- 
vorably with calculation directly from definitions in the manner suggested by 
reading books on algebra and solid analytic geometry. But even with large 
matrices Fisher^ method may perhaps be the best in certain cases, e.g. if all 
that is desired is the root of median absolute value and if this root is real, or if 
it is desired to find a few real roots that are close together, with numerous others 
greater and another numerous group less than these. This is because the itera- 
tive methods give the real roots in the order of their absolute values, beginning 
with the greatest, but with the possibility of obtaining them in the opposite 
order by first inverting the matrix. The Mallock electrical device [22] may be 
used to calculate determinants, and thus to apply this method. 

If A and B are p-rowed matrices and B is non-singular, the determinantal 
equation | A — XB | == 0 is ecjuivalent to | AB~^ — X | = 0 and also 
to I B~^A — X I = 0. The column vectors Xi satisfying (A — \B)X = 0 also 
satisfy {B~^A — X)X = 0 and the row vecto^rs Vi satisfying Vi{A — \iB) = 0 
also satisfy Vi(AB''^ — Xi) = 0. If A and B are symmetric, Vi = X'i . Thus 
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any problem of this type is reducible to that of finding latent roots and vectors, 
upon calculating by any method and multiplying in either order by A, 

The fundamental iterative method for finding latent roots and vectors of A 
begins with an arbitrary matrix Xo of a single column. This column vector 
is premultiplied by A to obtain a new column vector Xi . If, as is possible 
though unlikely, the elements of Xi are proportional to those of Xo , they con- 
stitute one of the latent vectors of A, and the factor of proportionality is the 
corresponding root, for then Xo and Xi are solutions of the matrix equation 
(A ~ X)X — 0. It should be observed that the latent vector is determined only 
to within an arbitrary scalar factor of proportionality, though we may some- 
times find it convenient to normalize the vector by choosing the factor in such 
a way that the sum of the squares of the elements, which equals the square of 
the norm, is unity. 

If Xi is not proportional to Xo , the operation may be repeated by calculating 
X2 = AXi , then X3 = AX2 , and so on. If thcvse vectors are then normalized, 
or if they are divided by, say, their respective first elements, then the other 
elements will (in the cases of greatest practical importance) gradually approach 
stable values which will determine one of the latent vectors, while the suc- 
cessive factors of proportionality will approach the corresponding root. The 
convergence of this process is however apt to be rather slow. Fortunately 
there are several known ways of accelerating it. 

Matrix-squaring is the first of these methods of accelerating convergence 
[17, 19]. It is clear that Xt = A^Xo . Consequently one application of the 
iterative process with A^ is equivalent to t iterations with A. It is relatively 
easy to square A, and then by repeated squarings to form A^, A**, A^®, etc. The 
economical limit of this process is determined partly by the necessity of re- 
training more and more digits in the successively higher powers, but up to a 
point not yet determined exactly it presents very great advantages. For pro- 
ceeding to the determination of latent roots of other than the maximum absolute 
value, with their associated vectors, this method lends itself to further short- 
cuts [17, 2], which seem to give it an advantage over an older method [13]. 

Another method of accelerating convergence, introduced by A. C. Aitken, 
and referred to by him as the 6^-method, uses the ratio 0(0 of an element of 
X/+1 to the corresponding element of X'^ in the function 

j.(M- ])«(« j-j) - [ mf 

<l>{t + 1) — 2 <t>(i) + <i>{t — 1) ’ 

which converges rapidly toward the root Xi of greatest absolute value. If a 
constant c is subtracted from all three of the quantities + 1), <l>(t) and(^(i — 1) 
before computing the foregoing function the result is unchanged. This fact 
reduces greatly the computational labor, since the decimal places of Xi already 
determined are common to all three. 

If A is symmetric and we form the scalar products of X* = A'Xo with itself 
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and with X«+i we have 

X\Xt = , xUiXt = . 

The ratio of these two scalars gives an estimate of Xj which on the basis of the 
ratios of consecutive elements in a given place in the trial vectors would not be 
reached until a later stage of convergence, corresponding in fact to twice as 
many iterations. Aitken has pointed out the great value of this procedure for 
finding th^root (but not the latent vector), and has extended the idea to asym- 
metric matrices, where there is a complication because of the existence of two 
latent vectors for each root, one determined by premultiplying by A, the other 
by A'. 

The comprehensive paper [2] of Aitken gives an extremely valuable account 
of the whole problem and processes of finding the latent roots and vectors, in- 
cluding a survey of the various cases arising when there arc multiple roots, 
complex roots, and non-linear elementary divisors. This paper should be studied 
carefully by anyone with any substantial numerical problem of this kind. 

A method using rotations of two variables at a time has been devised by T. L. 
Kelley [21]. 

The remainder of this paper will be concerned with some results, believed 
to be new, by which useful upper limits can be set for the errors of the results 
yielded by iteration for latent roots and vectors of a symmetric matrix. To 
find such limits of error for asymmetric matrices appears to be a much more 
difficult and as yet unsolved problem. 


13. Accuracy of iteration with symmetric matrices. If A is symmetric, as 
it is in most statistical problems (though with some exceptions, as in [18]), the 
roots are all real and the elementary divisors are linear. Moreover there exist 
an orthogonal matrix // and a diagonal matrix 


Xi 0 0 • • • 

0 Xj 0 • • • 

0 0 Xs • • • 


such that 

(13.1) A = HKH'. 

Since H is orthogonal, HH' = 1, and therefore 

(13.2) A = H'AH, A* = 

We may associate with the siiccessive trial vectors Xt = AXi-i = A*Xo the 
vectors Yt = H'Xt ; then Xt = HY t . From these equations and the second 
of (13.2) it is clear that 

Yt = H'Xt = H'A*Xo = A‘H'X, = A‘7o . 
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Hence, if the elements of Yo are yi , 


Yt = 



Now let akt be the scalar product of Xt and Xt-^k • 

(13.3) ai, = X'tXfk =Y[Yt-k = ; 


and let 

(13.4) 


:^yY\V * 


If A has a negative root this fact will become evident after a certain stage in 
the iteration used to obtain this root by an alternation of sign of the pumbers 
in any one position in consecutive trial vectors. However A^, which as pointed 
out in §12 may well be calculated anyhow, has only positive roots, which are 
the squares of the roots of A, and has the same latent vectors as A, Hence we 
shall have results of sufficient generality for real symmetric matrices if we as- 
sume that all roots of the matrix with which we work are positive or zero, i.e. 
that it is positive definite or semi-definite. Let us choose the notation so that 


Xi > X2 > • • • > Xp > 0. 

Then if k > 0, 

aoi = Si/iX, < = XiUkt- 

Hence, by (13.4), 

(13.5) X, > [Kt, ]-*'*. 

It is known [23] that if ai ,•••, Gp , ci Cp arc any positive numbers, the 

function 

\ Cl + • • • + Cp / 

increases monotonically with k. Putting Ci = y]\V, di == X“^ if X» 7 ^ 0, and 
Ci = Ot = 0 if Xt = 0, we find that the right-hand member of (13.5) decreases 
monotonically as k increases. Hence the best of these lower bounds for Xi is 
that corresponding to the least value of k that can be used, namely A; = 1. 
Conseciucntly the lower bound to be recommended for Xi is given by 

(13.6) Xi > 

Vu 

From (13.4) it is easily seen that this lower bound approaches Xi when t increases, 
provided yi 7^ 0. 



MATRIX CALCULATION 


31 


An upper bound for Xi is available from the fact that the sum of the ^th powers 
of the roots is the trace of Since we assume all X,- > 0 this gives 

Xi < (tr 

That this upper limit converges to Xi when t increases is easil}" seen from (0.7) 
upon consideration of log (2X^)^^^ 

A lower limit alternative to that of (13.0) is also available from tr (A*), and 
likewise converges to Xi . Indeed, since Xi is the greatest root, we have 

Xi > (trAVp)*^*. 

We now seek limits of accuracy for the latent vector corresponding to Xi and 
estimated by Xt . If wo call this vector X, and define Y = 
then lim Ff = 7*, where Y* is the normalized form of Y. Y* has as its tth 
element 


lim 

i-*oo 




If yi 9 ^ 0, and Xi > X 2 > Xs, this limit is ±1 if i = 1, and is otherwise 0. We 
take the value of Y to be 


1 

0 

0 


Loj 

in this case. If Xi is a multiple root, the limit of Xf will depend on the initial 
values . 

A useful measure of the closeness of approach of X« to X is the “correlation 

coefficient” r, = X*'Xl = Y*'Y* ^3-- , 

VSi/,xf 

which obviously approaches unity as t increases \i yx 7 ^ Q and Xi is a simple 
root, or if Xi = X 2 = • • • = Xs > X«+i and we arrange our definitions so that 
yx 9 ^ 0 and 2/2 = * * * = = 0. In terms of the notation previously introduced, 

Tt = . The sum of the squares of the differences of corresponding elements 

of the normalized vectors X* and X<, i.e. [Ar(X* — X^ )f, is 2(1 — r«), and 
therefore approaches zero as Vt approaches unity. We shall seek for r* a lower 
limit approaching unity as t increases. 

Let us now put 


Vt Xi 


Then r? = Wu and ^ Wu = 1 . 
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For fc > 1, 

«« = S3/?x!‘-* > ylxl'-* + ••• + > >.tiyl\V + ••• + yUV) 

= Xr* ctot{w2t + • • • + '^pt) == Xi * ao((l — Wu) = Xi * ao<(l — U) • 
r? > 1 - = 1 - vt,X?. 

<Xot 

This unfortunately is not a very useful lower bound for , since i| approaches 
zero, not unity, as t increases. 

A more satisfactory result is obtained as follows. Let 77* = X7^ Then 

p 

Vkt = ^ WitVi . For any value of t we may consider a distribution of a variate 

ib-l 

taking the positive values vi , • ‘ • y Vp with the positive weights, or probabilities, 
Wit . The fcth moment of this distribution about 0 is . In particular the 
first moment is , and is evidently at least equal to rji , which is the least of the 
Tji . The standard deviation is <r = — I'L • ^ increases, vit will ap- 

proach T7i and <r will approach zero. Hence, if Xi > X2 , a stage will eventually 
be reached at which Pn < 772 . Let 


k = 



By the Tchebychef-Bienaym^ inequality, 

W2t+ ^ - + Wpt < 


and therefore 


r] ^ Wu> I 


{V2 — 


provided t is large enough so that vn < rjo . This lower bound approaches 
unity, as desired, when t increases. 

If Xi = X2 , • • • = X* > X*+i , the same proof shows that 

«.!« + ■■•+<’> 1 - 

A (77,^.! - Pity 

provided ph < 77^+1 . The left member is the correlation of Xt with that one 
of the /^-parameter family of latent vectors corresponding to the multiple root 
for which the correlation is a maximum. 

In order to utilize these results we need a lower bound for 772 , or for 77r+i . 
In case Xi 9*^ X2 this requires an upper bound for X2 . Such an upper bound may 
be found at the next stage through working with the reduced or “deflated” 
matrix used in [17]. This is Ai = A — XiXX', where X is the normalized latent 
vector corresponding to Xi ; and X2 < tr (A{). 

Since we have arrived at a definite lower limit for which approaches unity 
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as the iterative process proceeds, and since we have found for Xi upper and 
lower bounds converging to it, a solution has been found for the troublesome 
problem of the degree of accuracy in stopping at any stage of the iteration for 
finding the greatest root and the associated latent vector. It would be possible 
to go on to find from these results appropriate inequalities for , and then by 
repetition of the above arguments, for X2 and the second latent vector; and then 
likewise for the second reduced matrix A2 and the further roots, vectors, and 
reduced matrices in this cyclic order. These steps may well be taken by the 
computer who has mastered the above argument in connection with a numerical 
example. 
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ON SOLUTIONS OF THE BEHRENS-FISHER PROBLEM, 

BASED ON THE ^DISTRIBUTION 

By Henry Scheff6 
Princeton University 

1. The Problem. The problem [1, 2] is the interval estimation^ of the dif- 

ference of the means of two normal populations when the ratio of the variances 
of the populations is unknown. The reader who wishes to sec the present, 
solution before considering theoretical details will find it recapitulated in the 
Summary at the end and will want to refer to the following notation: 
(xi , , • • • , ^m) and (^1,2/2,***, Vn) are random samples from normal popu- 

lations with means a and ^3, and variances ^ and v, respectively. Define 
8 — a We assume m ^ n, and that the variates in each sample are in 

the order of observation, or else have been randomized. 

Recently Neyman [3] has called attention to a solution which we shall desig- 
nate as (B), and which is a special case of an unpublished solution of Bartlett^. 
It will be simpler to describe (B) later, but we mention now that it has the 
following advantages: (i) its validity does not depend on the values of unknown 
parameters, (ii) th(' required computations are simple, and (iii) only existing 
tables arc nec^h'd, — the widely available Fisher Mables. An unsatisfactory 
asp('ct of (B) is that when the sample sizes are uneciual, n — m of the variates 
2/t are comj)letely discarded. The solution below shares with (B) the advan- 
tages (i), (n), (m); indeed, it is identical with (B) when 7i = m, but when 
n 9 ^ ni it is free from the above objection. 

2. Simple Solution. We begin with a simple restricted approach; later we 
will review tht* result from a soni(»what broader standpoint. If random variables 
(h , do j • y dyn are independently normally distributed with mean 8 and vari- 
ance (7^, and if L and Q are defined from 

m m 

' L = 11 di/m, Q = llidi- Lf, 

T-1 

then m^{lj — 8)/(r and Q/a^ are independently distributed; the former is a 
normal variable with zero mean and unit variance; the latter, Q/cr" = Xm-i , 
where xl a generic notation for a random variable distributed according to 
the x^-law with k degrees of freedom. The quotient 


^ We treat the problem from the standpoint of confidence intervals, rather than signifi- 
cance tests, since when the former arc available for 6 so is a whole class of the latter, namely 
for any hypothesis 5 = So , for all 6o . Furthermore, questions of the existence of “best’* 
tests and “best” confidence intervals are closely related [5a]. 

* How far Bartlett followed the path of this paper is not clear from the brief mention 
of his results by Welch [4], except that he did establish the sufficiency of certain ortho- 
gonality conditions. 
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m\L - S)/[Q/im - 1)]^ = , 

where (k denotes generically a variable having the /-distribution with k degrees 
of freedom. Define tk,, from 

(1) Pr (-<*.. ^ tk ^ tk..) = e. 

Then a set of confidence intervals for 5 with confidence coefficient « is 

(2) |5 - L| g /«_!.. (Q/[m(m - 1)]}*. 

Denote by E(l) the expected length of the confidence interval (2), 

E(l) = - l)r^aE[(Q/a^)% 

(3) E{1) = 
where 


c* = 2k-*E(xx) = (8/fc)‘r(iA; + i)/r(H'). 

The symmetrical choice (1) of the limits on the /-distribution minimizes (3). 

We consider using in connection with the confidence intervals (2) linear 
functions 

n 

(4) di = a;,- - 2 CiiVi, t = 1, 2, • • • , m. 

The variables di have a multivariate normal distribution. Necessary and 
sufficient conditions that the d, all have the same mean 5, equal variances 
and zero covariances, are easily found to be 

n n 

(5) Cij = 1, ^ CikCjk ~ C 8ij , 

j-l A-l 

where da = 1, Sij = 0 if i 5 ^ j. If (4) are used in (2), E{1) is given by (3) with 
0 -^ = M + c^v. Hence to minimize E{1) we must find an m X n matrix C == (dj) 
satisfying (5), and for which is minimum. The minimum value of is m/n: 
this is easily proved by the use of vector algebra. 

Let 7 t be the i-th row of C, and let ^ be the 1 X n matrix (1,1, • * • ,1). De- 
note the transpose of a matrix by a prime. Then the conditions (5) read 

(fi) 7i^' == L ytVi = c%j. 

First suppose vectors 71 , 72 , * • • ,7m satisfy (6)‘. Then it is possible to adjoin 
n -- m orthogonal vectors 7 m , • * ' , 7 n , so that the complete set satisfies the 
second group of conditions (6). Since this set is a basis in n-space, 

n 

^ = H gtyt, 

ifc-l 
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where the gu are scalars. Now 

n 

1 = 7.-^' = 2 P*7<7» = ?. c*, i = 1, 2, • • • , OT, 

ifc-1 

and thus gi = =•••=(/« = c"*. But 

» = #' = 2 ?*7*7* = »»c“* + S ^ mc“*, 

A;-l ik-m+l 

and hence ^ m/n. On the other hand this lower bound for c may be at- 
tained by taking any set jSi , /S2 , • • • , jSm of orthogonal vectors with norms 
m/n, that is, fiSi = hifmjn^ and rotating them so that their equal angles vector 
X = (nlm)(fii + 02+ • • • + 0m) coincides with Then \S = where SS' = /. 
For 7, = 0iS, 

y.rP' = 0iSS'\' = 0i\' = 1, 

7i7,' = 0^SS'0i = 0i0'i = diitn/n, 

so that equations (6) arc satisfied with = m/n. 

An especially neat solution of this minimum problem was obtained by the 
above method; its validity may easily be verified directly. It is 

(6ij(m/n)^ — + 1/n, j g m, 

Cij ~ j 

[l/n, j > m. 

Then 

m n 

di - Xi - {m/nfvi + (mn)“‘ S 2/y - ]C 2/;/«, 

;-l j-i 

and L and Q become simply L = x — y and 

m 

(7) Q = Z («.• - uf, 

where 

w n m 

(8) .r = Z 5 = Z «< = — (w/n) V. , m = 23 «./»». 

t—1 t— 1 t-»l 

We may now write (2) as^ 

(9) x-y — - l)]j^ 

The solution (B) mentioned at the beginning, consists of taking c»y = 3,y in 
(4), so that the conditions (5) are satisfied with = 1. Hence for both (B) 
and (9) the expected length of the confidence interval is given by (3), but with 
(T^ = At + V for (B), while = ^ + (m/n)v for (9). 


* Obvious modifications of (9) will make it suitable for * ‘one-sided’* estimation. 



38 


HENRY SCHEFF^ 


3, More General Solutions. We now generalize our approach to the following 
^extent: Let L be a linear form and Q a quadratic form in the variates 
Xi , • • • y , yi , • • • , Vn j with coefficients independent of the parameters 
'(i. of p.). If for some constant h i. of p., and some function/ of the parameters, 
ih{L — d)/f and Q/f are independently distributed, the former according to the 
normal law with zero mean and unit variance, the latter according to the x^“law 
with A; — 1 degrees of freedom, then the quotient 

(10) h{L - S)/[Q/{k - l)f 

will have the ^-distribution with A: — 1 degrees of freedom, no matter what the 
values of the parameters. 

We note that necessarily then 

(11) E(L) = 5, 

(12) / = h^E[{L - dfl 

The ^-distribution of (10) loads to the confidence intervals 

(13) I 5 - L I ^ h-x.AQ/ik - l)]V/<, 

where h_i,, is defined by (1), and the confidenct' eoeffieient is t. l^roceeding as 
toward (3), we find that the expected length of (13) is 

(14) E(Ji) = 

m n 

If E = OiXi — 2 biVi, 

t-1 t-1 

m n 

(15) E{L) aT,ai- PHh. 


Since a,- , hi are i. of p., it follows from (11) and (15) that 

( 16 ) E a.- = E = 1. 

t-1 1-1 


Writing 

(17) 


= X, — a, iji = yi — / 3 , 

m n 

E b &% ^i •“ bj TJi j 

t-1 i-l 


(18) 


E[(L - bf] = t^Ea. + ^Hb- = f/h’‘ 


t-1 


from (12); thus (14) may be written 


(19) 


E(l) = tk-i,,Ck-i 



a* + e 



From (18) we also have 

( 20 ) 


/ = oV + b^v, 
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where 

a^ = h^Z<i:, 

t-1 »-l 

are i. of p. 

4. Lemma. Wo propose to prove next that the maximum value of k is w, 
that is to say, it is impossil)le to obtain a /-distribution for a (luotic'iit (10) with 
more than m — 1 degre(\s of freedom. For this we need a lemma to the effect 
that certain well known sufficient conditions for a (piadratic form to have a 
xViistribution are also necessary. 

Since under our assumptions /r(L — 5)V/“ = Xi RbcI Q/f = Xk-\ are inde- 
pendent, therefore Q*/f = Xk , where 

Q* = l{\L - bf + Q. 

To shorten the notation, write 

[xi, i = 1, 2, • • • , m, 

Zi = -1 

i = m + 1, • • • , m + R, 
a, = A'(2t), = 2, — a», a] — E{fi), 

Tjct Q ~ ^ QstZgZiy 

9*t 

where the indic(\s s and / range from 1 to m + n throughout. Then q^t is i. of p., 
and 

Q = + 2 2^ + q, 

9,t 8 

where 

qa ~ ^ qal^ty q “ q%(^8* 

I 8 

From (17) 

/r(L — 5)^ = Patta^ty 

a,t 

where p.t are i. of p. Putting g* = q,t + p.t , q*i arc i. of p., and 

(21) Q* = 11 qlti.U + 2 Z g.f. + 3. 

8,t a 

The moment-generating function of Q*/f^ is 
<j,ie) = J^;[cxp {eQ*/f)] = Ci f • f exp { 0 Q*/f - 11 df.- 

JL.Q0 t^oo a a 

There exists a non-singular linear transformation from the f’s to v’s such that 
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8 8 

(22) 

Z q*t f.r* = Z x.« 

8,t 8 

Then 


(23) 



S M 


4>{6) = Cjc**'^* n f exp {— H*** “ 29(X,v^ + 2p,v)/f] \ dv 

8 W-OO 

= e*’"’ n (1 - 20X.//T‘ exp { 2 ff‘pl/(f - 2 e\.f)). 

< 

Now Q*/f = xl if and only if 

(l>(e) = (1 - 2er^\ 

Hence 

(24) P. = 0, (Z = 0, 

and k of the X# must be equal to f while the remaining X« vanish. No generality 
is lost in assuming 

(25) Xl = X 2 == • • • = Xa; = / , X^-fi = • • • = Xm + n = 0. 

Let Wi = fvi ,i = 1,2, • • • , /c. From equations (21) to (25) we deduce that 

(26) Q* = JL q*t = Z) Wf , 

8,t f«“l 

where qtt is i. of p., and the Wi are linear combinations of the f, such that 

(27) Eiw.w;) = A-y. 

That the conditions (26) are necessary^ for Q*/f^ = xl constitutes the desired 
lemma. 

6. Maximum Number of Degrees of Freedom. We have seen that the Wi 
in (26) must be of the form 

w n 

(28) Wi = 23 «</£>• + ^ bijrij . 

7—1 7-1 

We substitute (28) and (20) into (27) and write the result in matrix form, 

(29) nAA' + uBB' = (aV + b\)h , 

where /,• is the identity matrix of order j, = (aij), = (6<y), and when- 
ever a new matrix is introduced, a superscript r X c indicates that it has r rows 

‘ We have incidentally proved sufRciency. 
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and c columns. Now if we knew that A A' and J5JS' were i. of p., then we could 
equate coefficients of n and v in (29) and immediately draw the desired (‘onclu- 
sion k ^ m. But that A A', BB' are i. of p. is not obvious, since this need not 
be true of A and B. However, we do know that the matrices 

F = A'A, G = A'B, H = B'B 
are i. of p. because the matrix {qtt) of (26) is 



Multiplying (29) on the left by A' and on the right by A, we obtain 
(30) + vGG^ = (aV + }i'v)F. 

(30) must hold identically in *'• Since the coefficients of /i, v are now i. ofp., 

we may equate them, hence GG' = l)^F, Similarly multiplying (29) by J5' and 
we get G^G = a^IL Now^ for any matrix Af , rank M = rank M'M = rank AfA/'. 
Thus rank F = rank // = r, say. Again, F = A 'A, therefore r = rank A g m. 
Since F is a positive'’ matrix, i. of p., there exists a non-singular i. of p., 

such that 

(31) F = = A'A, 

where I j,r i« the jf X j matrix the first r of whose diagonal elements are unity and 
all other elements zero. J.ct = AP“\ Then 

(32) A = TP, TT = Im.r 
from (31). Likewise we can write 

(33) B = ini = /n.r, 

where R is non-singular and i. of p. Then G = A'B = P'T'UR, hence T'U = 
(PT'GR"' i« i. of p. We note 

where 

T[Ti = U[Ui = Ir . 

Since 


® A simple proof [6b] of this useful theorem is the following: Let r = rank M, p * rank 
M’M. p ^ r since the rank of the product cannot exceed the rank of a factor. M contains 
r independent column vectors; the Grammian matrix of these vectors is non-singular and 
appears as an r X r minor in M'M, Hence p ^ r. Furthermore, all principal minors of 
M'M are Grammian matrices (which always have non-negative determinants), hence M*M 
is always positive we use this below. 
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is i. of p., so is its minor = T[U\ . Write 



Then from (32), (33), 

(34) A = TiPi, B = UiRi . 

Substituting (34) in (29), we get 

(35) uTiPiP'X + vUxR,R[lJ[ = (aV + b\)h , 
and multiplying by T[ on the left, Ti on the right, 

UiPiP[ + vVRiR'iV = (aV + h\)Ir . 

Again the coefficients of n, v are i. of p., so 

PiP; = a%, 

(36) VRJt[V' = b^Ir . 

Similarly we find 

(37) RX = b% . 

From (36), (37), VV' = Ir . (35) now becomes 

(38) a^/iTiT'i + IM/X = . 

Multiplication of (38) on the right by Ui gives 

a-^TiV + = (aV + b\)Ui . 

Hence 7’iF = Ui , therefore UiU'i = TiT'i , and putting this back into (38) 
we have h = TiTi , rank h. — rank TiT'i = rank T'iTi = rank Ir , k = r ^ m. 

6. Minimum Expected Length of Confidence Intervals. We now point out 
that of all confidence intervals (13) with k = ni, the confidence intervals (9) have 
the minimum expected length. Recalling that the a, , in (19) are subject 
to the conditions (16), we easily find 

m n 

(39) Z o? S 1/w, Z S 1/n. 

»-l »-l 

From (39) and (19) we have 

E{1) ^ tm-i.tCm-An + {rnfn)v^/rr^, 

and referring to the statement at the end of section 2, the property of (9) asserted 
above is now obvious. 

7. Asymptotic Shortness of Confidence Intervals. Tn conclusion we wish 
to compare our results with the case where the ratio of the variances, d = v/fi, 
is known. If 
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m n 

S* = r (X, - X)^ S„ = 'E iVi - y)\ 

t-l t-1 

L = X - y, 0-1 = (itx/m) + {v/n), 

then (L — d)/<rL , Sx/ti, Sy/v are mutually independently distributed, the first 
normally with zero mean and unit variance, and aS^/m = Xw-i , ^y/v = Xn-i . 
Hence 

{L — + SyV~^)/(m + n — 2)]“^ = fm+n-2 . 


TABLE I 

Values of R for c = ,95 


n — 1 

m — 1 ^ 

5 

10 

20 

40 

00 

5 

1.15 

1.20 

1.23 

1.25 

1.28 

10 


1.05 

1.07 

1.09 

1.11 

20 



1.03 

1.03 

1.05 

40 




1.01 

1.02 


TA1H.E II 

Values of R for € = ,99 


n — 1 

-1 " ' 

5 

10 

20 

40 

CO 

5 

1.27 

1.36 

1.42 

1.47 

1.52 

10 


1.10 

1.13 

1.16 

1.20 

20 



1.05 

1.06 

1.09 

40 




1.02 

1.04 


This relation yields the confidence intervals 

( 40 ) I 5 - L I ^ <.+„-2,.(m + n- + en-')\S. + Sje)\ 

where the confidence coefficient is again c. The confidence intervals (40) are 
known to be highly efficient; for instance they are of the shortest unbiased 
type [5a]. We calculate their expected length to be 

EQ) = /„.+„_2.,Cm+„-2[/i + {7n/n)vf/mK 

The ratio R of E{1) for (9) to Eif) for (40) is thus 

( 41 ) R ” (^m— l,eCm--l)/(^»»+n— 2 ,€Cm+n— 2)* 
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As A; 00 , Ck — > 2, tk,t <oo.« > hence as ?n — ► oo, 72 — » 1 no matter what the 
values of n ^ m. For small values of m the ratio of the t values in (41) is con- 
siderably >1, but this is partly offset by Ck approaching its limiting value 2 
from below so that the ratio of the c\s is < 1 . The behaviour of R for finite m 
is indicated in Tables I and II. Table I (II) tells us for example that with 
m > 10, and e = .95 (.99), the expected length of the confidence intervals (9) 
is at most 11 per cent (20%) longer than that of the optimum confidence inter- 
vals (40) available when the ratio d is known. While we may conclude from 
72 — > 1 as m —> x , that our solution (9) is asymptotically extremely efficient, we 
cannot conclude from Tables I, II that for small m (9) is inefficient, since we 
do not know what the lengthening effect of the extra nuisance parameter in the 
Behrens-Fisher problem would be on “best” confidence intervals. 

8. Summary. In the terminology of the first paragraphs of sections 1 and 3 
we have proved that there do not exist a linear form L and a quadratic form Q 
in the observations such that the quotient (10) will have the ^-distribution (for 
all values of the parameters) with more than m — 1 degrees of freedom. We 
have further shown that of all (confidence inter\^als (13) based on the ^distribu- 
tion with m — 1 degrees of freedom, and with confidence coefficient €, (9) has 
the minimum expected length. The quantities needed to apply our solution 

[9] are given by (1), (7) and (8). Finally, by comparing this solution with a 
known highly efficient solution for the case when the ratio of tlu^ p()])ulation 
variances is known, it has been possible to show that at least tusymptotically 
our confidence intervals (9) are very short. 
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AN EXTENSION OF WILKS’ METHOD FOR SETTING TOLERANCE 

LIMITS 


By Abraham Wald 
Columbia University 

1. Introduction. Let a: be a random variable and let f{x) be its probability 
density function. Suppose that nothing is known about f(x) except that it is 
continuous. Let Xi , • • • , x« be n independent observations on x. The prob- 
lem of setting tolerance limits can be formulated as follows: For some given 
positive values /3 < 1 and y < I we have to construct two functions L{xi , • • • ,x„) 
and M(xi , • • • , .r„), called tolerance limits, such that the probability that 


( 1 ) 



dt > 7, 


holds, is equal to /J. This problem has recently been solved by S. S. Wilks^ in 
a very satisfactory way when nothing is known about /(x) except that it is con- 
tinuous. Wilks proposes the following solution: Let Xi , • • • , Xn be the observed 
values of x arranged in order of increasing magnitude. Then L = Xr and 
M = Xn-r+i where r denotes a positive integer. The exact sampling distribution 

r^n-r+l 

of the statistic / /(<) dt is derived by Wilks and this provides the solution 


for the problem of setting tolerance limits. A very important feature of Wilks’ 

/(<) dt is entirely independent 

r^n—r+l 

of the unknown density function /(x), i.e. the distribution of / /(/) dt 

Jxr 

is the same for any arbitrary continuous density function /(x). 

In this paper we shall give an extension of Wilks^ method to the multivariate 
case. Let Xj , • • • , Xp be a set of p random variables with the joint probability 
density function f(xi , • • • , Xp). Suppose that nothing is known about 
/(xi , • • • , Xp) except that it is a continuous function of Xi , • • • , Xp . A sample 
of n independent observations is drawn and the a-th observation on x,- is denoted 
by Xia (i = 1, • • • , p; a = 1, • • • , 7i). The problem of setting tolerance limits 
for xi , • • * , Xp can be formulated as follows: For some given positive values 
< 1 and y < 1 we have to construct p pairs of functions of the observa- 
lions Li{xn , ■ * • , Xpn) and M,(xn , * • • , Xpn) (f = 1, • • • , p) such that the prob- 
ability that 


( 2 ) 



* * * > ^p) dti • • * dtp ^ y, 


^ S. S. Wilks, “Determination of sample sizes for setting tolerance limits,” Annals of 
Math. Stat„y oh 12 (1941). 
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holds, is equal to /3. The functions L* and Mi are called the lower and upper 
tolerance limits of Xi . A natural extension of Wilks' procedure would seem to 
be the following: Let , • • • , Xin be the observations on x, arranged in order 
of increasing magnitude and let Li = Xin and Mi = x^^ (i = 1, • • • , p) where 
r* and Si denote some integers. However, this choice of the tolerance limits 
does not provide a satisfactory solution of our problem, since the distribution 
of (2) is not independent of the unknown density function f(xi, • • • , Xp). It 
will be shown in this paper that by a slight modification of the above procedure 
the distribution of (2) becomes entirely independent of the unknown density 
function f(xi , • ‘ , Xp), In section 2 we will treat the bivariate case and in 
section 3 we will extend the results to multivariate distributions. 


2* The bivariate case. In this section we deal with the case when p = 2. 
Let xii , • • • , Xin be the observations on xi arranged in order of increasing magni- 
tude. We may assume that orn < 0:12 < • • • < x^n since the probability" of an 
equality sign is equal to zero. We define 

(3) Li = Xir^ and Mi = xu^ , 

where n and Si denote some positive integers and n < Si < n. Consider only 
those sample points (xia , x^a) for which x^i < Xia < xui , i.c. consider 
the sample points (ari.n+i , ^2.ri+i), •••, X2.«i-i). Denote by 

X2,ri+i , * * * , values .^2,^+1 , • • * , arranged in order of inereasing 

magnitude. We define 

(4) L2 = x'zn and M2 = , 

where r2 and S2 denote some positive integers for whi(4i r2 < S2 < — ri — 1 . 

We will show that the distribution of the statistic 

f ^2 r^i 

( 5 ) Q= S{h,h)dhdh, 

J L2 ^ Li 

is entirely independent of the unknown density function J(xi , X2). Denote by 
ip{xi) the marginal distribution of Xi , i.e. 

(6) <p{xi) — / f[pci , 3 : 2 ) dx 2 . 


Furthermore denote by , Li , Mi) the conditional distribution of Xi calcu- 
lated under the condition that Li < x\ < Mi . Hence 


(7) 


'^{x2, L\ , Ml) = - 


fMi 

/ f{x\, Xi) dxi 

•'ll 


I I f(xi , Xi) dxi dxi 

•Loo 


Let 
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and 

(9) P= r\{t,LuM{)dt 

Jl2 

From (5), (8) and (9) it follows that 


(10) Q = PP. 

It is obvious that the distribution of P is given by Wilks^ formula. Since Wilks 
derived the distribution only when = n — n + 1 , we will briefly give here the 
derivation for any integers ri and . 

v?(0 and / ip(t) dt = v. Then the joint probability density 

function of u and v is given by 

(11) - u - dudv, 


where c is a constant. We obviously have P = 'I — a — u. The joint density 
function of P and u is given by 

(12) - u - Py^^ dudP, 


where ti is restricted to the interval [0, 1 — P]. Hence the distribution of P 
is given by 

^ ^ _ py-s^ 

Jo 

- r " i^)'" 

= - P)" -'■+’•» f r'~‘(l - P)""** dT 

Jo 

Since the integral of the density function of P over the range of P must be 
equal to 1, we find that 


c' = r(n + l)/r(si — ri)r(n — Si + n + 1). 


Hence the probability density function of P is given by 


(13) 


r(n + 1) 


r(si - ri)r(n - 8i + n + 1) 


_ p^n-M+r, 


Since a: 2 ,r,+i , • • • , X 2 ,,,_i can be considered as Si — ri — 1 independent observa- 
tions on a random variable t the distribution of which is given by Li , Mi) dt, 
for any given values Li , Mi the conditional distribution of P is given by the 
expression we obtain from (13) by substituting r 2 for ri , Si for si and Si — ri — 1 
for n. Hence the conditional distribution of P is given by 
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(14) 


r(si - r,) 




r(s 2 — r 2 )r(si — ri — s® + rt) 

Since the expression (14) does not involve the quantities Li and Mi , P is distri- 
buted independently of Li and iJf i . Hence the joint density function of P 
and P is given by the product of (13) and (14), i.e. by 

(15) _ jiyi-ri-i-H+r, ^p^ 

where A denotes the product of the constant coefficients in (13) and (14). From 

(15) it follows that the joint distribution of P and Q = PP is given by 

(16) A(1 - _ Q)*l-0-I-.2+r, 

Since the range of P is the interval [Q, 1], the distribution of Q is given by 

(17) f (1 - P)"-*‘+'>(P _ Q)*l-r.-l-.,+r, ^p 
Jq 

Let R = P — Q, Then we have 
(1 - ?)"-“+’•> (P - 

(18) = t ^ (I - Q- 

Jo 

= (1 - Q)«-l-*2+r2 f Q _ yjn-M+ri -l-.j+rj ^jy 

Jo 


/■ 


From (17) and (18) it follows that the probability density function of Q 
^ is given by 


(19) 


r(^ + 1) 


r(52 - r2)r(n - 52 + r2 + 1) 


-Tx Q**"'*~'(l ~ dQ. 


3. The multivariate case. We may assume that no two elements of the matrix 
II Xia II (i = 1, * * * » P; « = 1, • * * , ^) are equal, since the probability of this 
event is equal to 1. For each a let ta(a = 1, • • • , n) be the point with the 
coordinates Xia , • • • , Xpa . Let Xn , • • • , xin be the observations on Xi arranged 
in order of increasing magnitude. Then Li = xin and Mi = xi,^ . The quan- 
tities Li and M, (i = 2, • • • , p) are defined in the following manner: Let S be 
the set of all points for which 

Lj Xja Mj (ji = 1, • * • , i 1), 

Arrange the i-th coordinates of the points in S in order of increasing magnitude. 
Then Li is equal to the rrth element and Mi is equal to the s,-th element of this 
ordered sequence. We will derive the distribution of 
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Let 

( 21 ) 


/+» -+00 

••• / / 

00 J— 00 Li 



• • • , Xp) (Ixi • • • dxp 


(i= 1, 1). 


Denote by ip^{xi , Li , il/i , • • • , L,_i , (i = 2, • • • , p) the conditional 

probability density function of a:* calculated under the condition that 
Lj < Xj < Mj (j = 1, • • • , ^ — 1). Let 

rA/» 

(22) Pi = I <pt(XijLiyAli,-*^y Li^i , dxt . 

We obviously have 

(23) (i = 1, ••• ,p - 1). 
Wc will prove that the probability density function of Q,- is given by 


( 24 ) 


r(n + 1) 


r(Si - r,)r(n - s, + r< + 1) 


Q*<-.-i(l _ dQi, (I = 1, • • • , p). 


This is certainly tnie for i = 1,2. We will assume that it is true for i = j and 
we will prove it for t = j -j- 1. It is easy to see that Qj and P,+i arc indepen- 
dently distributed and that the probability density function of P,+i is given by 


l'(Sj - ri) 


(25) r(s,+, - r,+i)r(s,- - r,- - + r,-+i) 


(P,+i)*’' 




(1 - dPi+i. 


The joint distribution of Q, and Pj+i is of the same form as the joint distribu- 
tion of F and P in section 2. Hence the distribution of QjPj+i can be obtained 
from the distribution of Q = PP by substituting ry+i for rj and sy+i for st . The 
distribution of Q is given in (19). Making th(! above substitution in formula 
(19) we obtain formula (24) for i = j + 1. Hence the validity of (24) is proved 
for i = 1,2, ■ • • ,p. In particular, the distribution of Qp is given by 


(26) 


r(« -h 1) 


r(Sp - rp)r(n - Sp Tp -F 1) 


Qp' (1 


Qp) 


n— 


dQp. 


It is interesting to note that the distribution of Qp does not depend on the 
integers ri , si , • • • , Vp^i , Sp-i . The construction of the tolerance limits 
Li , Mi (i = 1, • • • , p), as proposed here, is somewhat asymmetric, since it 
depends on the order of the variates Xi y • • • , Xp , In practical applications the 
asymmetry of the construction will be very slight, since in most practical cases 
the integers Vp and Sp will be chosen so that (sp — Vp — l)/n will be near to 1. 
If, for example, (Sp — — l)/n > .95, the tolerance limits will be affected only 

very slightly by a permutation of the variates , • • • , Xp , However, it would 
be desirable to find a construction which is entirely independent of the order of 
the variates Xi , • • • , a:p . 
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4. Tolerance regions composed of several rectangles. For the sake of sim- 
plicity we will consider here the bivariate case. All results obtained in this 
section can be extended without any difficulty to the multivariate case. 

In section 2 the tolerance region has been a single rectangle in the plane (xi , X2) 
determined by the four lines .Ti = Li , = Mi ; 0*2 = L2 and X2 = M2 • If the 

variates Xi and X2 are strongly correlated, a tolerance region of rectangle shape 
seems to be unfavorable, since it will cover an unnecessarily large area in the 
(xi , X2) plane. The situation is illustrated in figure 1 where the scatter of a 





bivariate sample of size ri = 19 is shown. Suppose we choose n = 3 , si = 17; 
^2 = 1 , 52 = 13 , then the tolerance region T, as defined in section 2 , will be the 
rectangle determined hy the lines Xi = Li = Xi,z ; xi = Mi = xi,n ; 
0:2 = L2 = X2,i ; and X2 = M2 = 0:2.13 . Nowconsider the tolerance region V 
consisting of 3 small rectangles Ri , R2 and R^ defined as follows: 

The rectangle Ri is determined by the vertical lines through 0:1,1 and 0:1,7 and 
the horizontal lines through the sample points with smallest and largest ordinate, 
restricting ourselves to points which have abscissa values in the interior of the 
interval [0:1,1 , 0:1,7]. Similarly R2 is determined by the vertical lines through 
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Xl,^ and Xi,i3 and the horizontal lines through the sample points with largest 
and smallest ordinate, restricting ourselves to points with abscissa values in the 
interior of [xij , Xi.13]. Finally /?3 is determined by the vertical lines through 
^1,13 and xi.ig and the horizontal lines through the sample points with largest 
and smallest ordinate, restricting ourselves to points whose abscissa values lie in 
the interior of [j"i,i3 , .rijy]. The region T" consisting of the rectangles Ri , /?> 
and 7?3 has a much smaller area than the region T. As we will sec later, the prob- 
ability distribution of the statistic J f(xi , X 2 )dxi dx^ is cxactl}^ the same as 

r' 

that of j j f{xi , X 2 )dxi dx^ . Thus the use of T" may be preferred to that of T, 

T 

We will consider tolerance regions T* of the following general shape: Let 
, • • • , nik be k positive integers such that 1 < mi , m^. < n and — mt > 3 
where n is the size of the bivariate sample. Let F, be the vertical line in the 
(.ri , Xo) plane given by the equation (f = 1, * • * , k). The number of 

sample' points which lie between the vertical lines Vt and Vi^i is obviously equal 
to — Mi — 1. Through each point which lies between the vertical lines 
Vi and F,+i we draw a horizontal line. In this way we obtain mi+i — m, — 1 
horizontal lines IFt.i , • • • , where the line TF^.y^-i is above the line 

. Denote by R^J (i = 1, • • • , — 1; j = L * * • , w»4-i — 2) the 

rectangle determined by the lines F,- , F»fi , TF,-,y , IF,,y+i . Let T* be a region 
composed of s different rectangles /?,y. The regions T and T' in the example 
illustrated in figure 1 are special cases of the type of regions T* as described 
above. For the region T we have k = 2, vii = 3, VI 2 = 17, s = 12, and for the 
region T' we have k — 4, irii = 1, = 7, nh = 13, vu — 19 and 5 = 12. 

Let be given by J J f(xi , X 2 )dxi dx 2 . \N'e will prove that the probability 


density function of Q* is given by 


(27) 


r(n + 1) 


r(5)r(7t -5+1) 


Q 


**"*(1 - Q * y ~’ dQ * 


lA^t ft(x 2 ) dx 2 be the conditional distribution of X 2 under the restriction that 
< Xi < Xi,m,^i . Thus, we have 


(28) 


M^2) = 


/ f(xi , X2) dxi 


/ / /(^i > ^ 2 ) dxi dx2 

J-QO 

Denote by <^(0:1) dxi the marginal distribution of Xi , i.e. 
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Let 

(29) P* 
and 

(30) P* 


= f <fi(xi) dxi 


= £ f dXi 

J •'Oij 


(i = 1, • • • , A - 1) 

(i = 1, • • • , A: — 1) 


where aij is the ordinate of the lower corners and hij is the ordinate of the upper 
corners of the rectangle Rij and the summation is to be taken OA^er all values of j 
for which 72, y is included in T*. It is clear that 


(31) 


Q*" = 




♦ 

A-l . 


Let y be any random variable which has a continuous probability density func- 
tion, say 4^{y) dy. Furthermore let t/i , • • • , 2/n be n independent observations 
on y. Let 4^i{y) dy be the conditional density function of y under the condition 
that y is restricted to the interval [ym, , Let 

(32) = ^ 'l'(y)dy 

i.l + , 

where the summation is taken over all pairs i, j for which Rij is contained in 7’*. 
Let 


Pi = / Hy)dy, and 

Pi = 12 My)dy, 

where the summation is to be taken over all values j for which Rij is contained 
in T*. We obviously have 

( 33 ) = ••• +pLiPLi. 

It is easy to verify that (i) the joint distribution of Pj , • • • , Pk~i is the same as 
the joint distribution of P* , • • • , P*_i ; (ii) the distribution of P[ is the same 
as that of Pf (i = 1, • • • , /c — 1); (iii) the variates Pi , • • • , Pl_i are indepen- 
dent of each other and also of Pi , • • • , PI_i ; (iv) the variates Pf , • • • , Pt~i 
are independent of each other and also of Pi , • • • ,. P^-i . Hence it follows from 

(31) and (33) that the distribution of Q* is the same as that of P'. Now we will 
derive the distribution of P'. The expression P' can be written in the following 
form: 

P' = £ r* 4>(y) dy, 

t-l •'Vr.. 


( 34 ) 
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where n , Si , • • • ,ri,si are some positive integers for which 1 < n < si < rj < 
Sj < • • • < ri < Si < n. Let 


P" -‘trmdv + 1""''“'” 

t -1 


(35) 


Hy) dy 


= 2 - / Hv) dy + / rp(y) dy. 

»-l *'Vr. "'Vr.. 


For any fixed value 2 / 1 .^-, denote by tpi(y) the conditional probability density of y 
under the restriction that y < y,^_^ and by ypiiy) the conditional (iistribution of y 
under the restriction that y > y,^_^ . Let 


P = / dy Pi = S / My) dy; 

J—ao l—l 

rvsi + 

P 2 = / My) dy and P 3 = / ^ 2 ( 2 /) dy. 

•'Vr. Jy.. . 




Then it follows from (34) and (35) that 


P' = PPx + (1 - P)P2 , 

(36) 

P" = pp, + (1 - P)P3 . 

For calculating the distributions of P 2 and P 3 we may consider the variates 
, * * • , ?/n as n — s /- 1 independent observations drawn from a population 
which has the distribution Mu) dy. Hence, the distribution of P 2 can be de- 
rived from (13) and it is easy to verify that the distribution of P 3 is the same as 
that of Po . It is clear that P 2 is independent of P and Pi . Similarly P 3 is 
independent of P and Pi . Hence, because of (36) the distribution of P' must 
be the same as that of P". 


In the same way we find that the distribution of P" is the same as the distribu- 
tion of 


P = Z) / Hy) dy + / 

t-1 


Hy) dy 


Thus, by induction we see that the distribution of P' is the same as the distribu- 

'f'iy) dy where s = («» — I’i). From (13) 

Ifrj 1-1 

it follows that the distribution of Po is given by 

Hence, we have proved that the distribution of Q* is given by (27). 
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5. Summary of the results and numerical illustrations. I shall give here a 
summary of the results obtained and a few illustrative examples. The multi- 
variate case being a straightforward extension of the bivariate case, I shall 
discuss merely the latter. C^onsider a pair of random variables x and y. Denote 
by /(x, y)dx dy the joint probability density function of x and y and suppovse 
that nothing is known about /(a;, y) except that it is continuous. A sample of 
n pairs of independent observations (xi , yO, * • • , (.i^ , 2/n) is drawn from this bi- 
variate population. The sample can be represented by n points 7>i , * * * , Pn 
in the plane (x, y), p* being the point with the coordinates and yi . In section 2 
we have dealt with the problem of finding a rectangle T in the plane (x, ?/), 
called tolerance region, such that we can state with high probability, say with 
probability .98 or .99, that the proportion Q of the bivariate universe included 
in the rectangle T is not less than a given number b, say not less than .98 or .99. 
The rectangle T is constructed as follows: Suppose that the points pi , • * * , p« 
arc arranged in order of increasing magnitude of their abscissa \'alucs, i.e. 
Xi < X 2 < • • • < Xn . AVc draw a vertical line through the point and a 
vertical line F*! through ps^ where ri and Si are positive integers such that 
1 ^ ^ Si — 3 and Si < /i. We consider the set S consisting of the points 

Pri+i , • • * , which lie between the vertical lines and F^^ . We draw a 
horizontal line Hr 2 through the point of which has the r2-th smallc'st ordinate 
in S, Finallj'' a horizontal line i« drawn through the point of S which has 
the S2-th smallest ordinate in S. The values r2 and 6*2 arc positive int('gers for 
which r 2 < S 2 • The tolerance region T is the rectangle determined by tlu^ lines 
Fr, , F^i , Hr 2 and Hg^ . The probability p that at least the porportion 
6(0 < 6 < 1) of the universe is included in T is given by 


(37) 


^ L i'isi - 


r(n + 1) 


Q 




(1 -Q) 


n~-82~^T2 


dP. 


r2)r(n — So + >•2 4- 1) 

It is known that if a random variable v{0 < r < 1) has the distriljution 

l'(c 4" d) c-i/1 \d—i 1 

mm" *’ 


2c 1 ”” V 

and 2c and 2d are positive integers, then ^ - — has the F-distribution (analysis 

of variance distribution) with 2d and 2c degrees of freedom. Thus, 


(39) 


_ 2(^ 2 - rz) 1 - Q ^ „ 

2(n — 52 4 r2 + 1) Q 


has the F-distribution with 2(n — S2 4- r2 4* 1) and 2(s2 — ri) degrees of free- 
dom. hVom (37) it follows that p is equal to the probability that 


F < 


2(s 2 — r2) 

2(n - 52 -4 r2 ^FT) 


1 - b 
b 


where F has the analysis of variance distribution with 2(n — S2 + ri -|- 1) and 
2 (s 2 — r2) degrees of freedom. For the case ri = 1, si = n, r2 = 1 and S2 = 
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n — 2, the following table gives the value of the sample size n which is necessary 
for having the probability p that at least the proportion 6 of the universe is 
included in the tolerance rectangle T. 



h » .97 

h * .975 

h = .98 

h = .985 

h = .99 

p — .99 

f 

332 

398 

499 

608 

1001 

p = .95 

250 

309 

385 

515 

771 


Thus, if we want the probability to be .99 that the tolerance region will include 
at least 98 per cent of the universe, the sample size must be 499. 

In section 4 tolerance regions are considered which are composed of several 
rectangles. Such a tolerance region may be more favorable than a single rec- 
tangle if X and y are highly correlated. As an illustration we consider tolerance 
regions T* constructed as follows: Suppose that yi is divisible by 4 and the sample 
points Pi , . . . , pn are arranged in order of increasing magnitude of their abscissa 
values. We draw the vertical lines Fo , Fi , F2 , F3 and V\ through the points 
Vi j Pn/4 , Pn/2 , P3n/4 Rud p„ . Lct =1,2, 3, 4) bc tlic rcctanglc determined 
by the vertical lines F,-_i and F, and the horizontal lines Hi and //» where Hi 
and H[ are defined as follows: consider only the points which lie between the 
two vertical lines F,_i and F» ([)oints on the vertical lines arc excluded). From 
these select the point with the smallest and the point with the largest ordinate. 
The lines Ht and H[ are the horizontal lines which go through these two points 
respectively. The tolerance region T* is composed of the four rectangles Ri , 
/?2 , and R \ . The number of rectangles Rij (defined in section 4) included in 
T* is ecpial to s = n — 9. Thus, according to the results of section 4 the prob- 
ability distribution of the proportion Q* of the universe included in the region 
T* is given by 


+ 1 ) 

ri?i ~ 9)r(io) ^ 


(1 - Q*fdQ*, 


Numerical calculations show that for n = 1000 the probability is .99 that at least 
98.1 per cent of the universe will be included in the tolerance region T*, 



ASYMPTOTIC FORMULAS FOR SIGNIFICANCE LEVELS OF CERTAIN 

DISTRIBUTIONS 

By Alfred M. Pbiser 
Cornell University 

1. Introduction. The purpose of this paper is to derive asymptotic formulas 
for the significance levels, or per cent points, of certain well-known statistical 
distributions/ Although we restrict ourselves here to two distributions, those 
of Chi-Square and of Student’s t, it will be apparent that the methods used are 
applicable to many other distributions as well. 

The following results are obtained. Let yp be the p per cent point of the 
normal distribution, that is, the distribution defined by 


HVp) = 1 - p. 

If Xp.n and tp,n denote the p per cent points of the Chi-Square and Student’s t 
distributions with n degrees of freedom respectively, then 

(L3) Xp.n = « + J/p \/2n -t- 1 (^p — 1) -+- ® ’ 

and 

(1.4) = yp + • 

These formulas approximate the true values of xl.n and tp,n to a high degree of 
accuracy. Tables of comparative values for several values of p and n are given 
in Section 4. 

We shall need the following theorem due to Cramer [3, p. 81; see also pp. 
86-87]. 

Theorem 1 : Let Xi , X2 , • • • be a sequence of independent, identically distrib- 
uted random variables having an absolutely continuous distribution function with 
mean value zero, dispersion a and finite fifth absolute moment Let Hn{x) be the 
distribution function of {Xi + ••• + Xn)/{(T\/n), and let denote the 

r-th semi-invariant of Hnix). Then 

(1.5) ^{x) - //„(x) = $‘’’(x) - 4>'«(x) - 5^* #'‘>(x) -bO(n-’'*). 

3!vn 4!n bin 


1 This problem was proposed to the author by J. H. Curtiss. 
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( 1 . 1 ) 

so that 
( 1 . 2 ) 
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2. The Chi-Square distribution. A random variable X is said to be distrib- 
uted according to Chi-Square with n degrees of freedom (X = xl) if its distribu- 
tion function is 


( 2 . 1 ) 


Fnix) = 





2*r(^) 


dt, 


X ^ 0, 


[ 0 , X < 0. 

The variable (x» — n)f^2n then has the distribution function 
(2.2) i/„(x) = F„(n + xV^). 


If we write 


(2.3) Xp.« = n + VpV^n -f- a„ , 
so that 

(2.4) Fnixln) = 1 - P, 


and let Zjm — Vp + a„/\/^, then H^izpn) = 1 — p, and it follows from (1.1) 
and (1.2) that 


(2.5) <!>(«,„) - lh{zp.) = Hzpn) - HVp) = 

V 2ir V 2n 

where 0 < ^ < 1. Then hy a theorem of Liapoiinoff^s [3, p. 77], 

I I ^ /i C log n 

V 2n Vn ’ 

where K denotes a constant independent of n. But if lim | Un |/\/2n = oo, 

n-»oo 

then lim Hn{zpn) is either 0 or 1. Hence an = o(\/ n). 

n-+oo 

Fisher [1, p. 81] has suggested the use of 

Xp.n = MVp + \/2ra - if. 

A closer approximation, 



has been obtained by Wilson and Hilferty [2]. It is interesting to note that, 
according to (1.3), this last ap{)roximation is correct to terms of the zero-th 
order in n. 

We apply Theorem 1 to the variables Xj = (xi — l)/-\/2, j = 1, 2, • • • . 
Then <r = 1, and, by the additive property of the Chi-Square distribution 
[3, p. 45], H„{x), the distribution function of the variable {Xi + • • • + X„)/-v/n, 
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is related to F„(a:) by (2.2). Thus, Xs = 2\/2. It follows from (1.5) and 
(2.3) that ' 

lim ■v/2n — ^f»(zp»)] = lim - (z^ — l)e"^**" 

n-»oo n-»oo 0\/ ^TT 

(2-e) 

= 3^, 

since a„ = o(-y/n). Then by (2.5) and (2.6) 

lim a„ = §(j/p — 1). 

n-*oo 

According to (2.3) we may now write 

Xp. 2 n = 2n + 2yp\/ n + 2rp + 2bn , 

where 

(2.7) rp = Uvl - 1), 

and 6n = o(l). A simple change of variables in (2.1) yields 

(2« [l + ^- + 5]"' *. 


If we let 




Jn = r dv, 

'>Vp V2 t 


( 2 . 10 ) 


V2jr 


where 8n = o(l). By (1.2) and (2.4), 


( 2 . 11 ) 


Jn — ^ (vp + ~ f^2n(Xj>.Sn). 


Using Stirling’s formula for r(a + 1) in (2.8), (2.11) becomes 

+ (« - 1 ) log (l + ^ + t') + l’ - - r,} -l] , 


r” 

/ <>» 

Jv+— 'v/9-jr 


\/22r 

V" 
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where - i ^ - r. - i + l’ + .% - 0 + ^ g - „ - + /.(„), 

nfn(v) being dominated by P(| v |), where P is a polynomial in v independent of 
n, and/„(t;) = 0(n“®^^). Then 

»“> -'{n * l) • a* • Ch jaK — " 1 * 




+ 


f , g„(v) dv+f\ (± 4) 

Jt,p+-;LV2T V2x \j-3 ;! / 

V 7* V w 


where gn(v) has the same properties given above for/n(?^). If we call these last 
integrals Ki , K 2 , and Ka respectively, we see that 


(2.13) 


lim nKi 


=/■ 


00 


\/27r » 


lim ngn{v) dv = 0. 


Also, since An , j = 1, 2, • • • is dominated by Py(| v |), Pj(v) being a poly- 
nomial in V independent of n, we see that 


« pw { A \} '* 1 / fen \2 ( 2 /p T — 7 = I 

^ ^ M dv<j:e- vV 

;-3 Ji,p+ \/27r ;-3 ^ 

v» 

where Qy is a polynomial. Since this last sum converges, we have 

00 ^00 — Jj ?2 

(2.14) ” 


. \2 Qjlt/p + 




nK, = tf dv, 

V n 

and by the uniform convergence of (2.14), 

00 pOO — Jv* 

(2.15) lim nKi = ^ I 7—7— li™ dv = 0, 

n-»oo j-3 J Vp jl\/2w 


since An = 0(n ^^^). 

Integrating by parts we obtain 


T 1 K 2 


vs* 


K»**)’(| 




lim nKi = 0. 

n— *00 


and since K = 0 ( 1 ) 
(2.16) 
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Further integration by parts and the use of (2.7) yields 

(2.17) lim nKi = (yl - 7y,). 

n-*oo 36\/ 2ir 

Then, by (2.10), (2.12), (2.13), (2.15), (2.16) and (2.17), 

lim hny/n = ^ (yt - 7yp), 

uO 

so that 

Xp.Jn = 2n + 2j/p\/ n + I (l/p — 1) + ® ' 

Equation (1.3) now follows at once. 


3. Student’s t. If the random variable ¥„ has the distribution function 
^(*/Vn), then <„ = Y„/xn is distributed according to Student’s distribution 
for n degrees of freedom and has the distribution function 


•Loo \/ nv r[^n] \ 

If (T = \/nl{n — 2), the variable UI(t then has the distribution function 
(3.1) lU{x) = GM, 


If we write 


m that 


^p.n — 2/p “b Un > 


Gn(tp,n) — 1 Pf 


and let Zpn = tp,n/(r, then Hn(zpn) = 1 — p, and it follows from (1.1) and (1.2) 
that 

^(^pw) Hn(Zpn) = ^(Zpn) ^(Vp) 

where 0 < ^ < 1. Then by Liapounoff’s Theorem [3, p. 77], 


0-0+ V 


-»[»,+»(>'^(;-‘).+“»)]' ^ K log 


where K denotes a constant independent of n. But if lim | On | = « , then 

n-*oo 

lim Hn{Spn) is either 0 or 1. 

n-*9c 

Hence an = o(l). 
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We a pply Theore m 1 to the variables Xj = YJxn , j = 1, 2, • • • . Then 
a — \/w/(n — 2), and by the additive property of the normal distribution, 
H„(x), the distribution function of (Xi + • • • + X„)/{<T\/n), satisfies the 
relation (3.1). Thus Xs = 0 and X4 = (Sn/(n — 4). It follows from (1.5) and 
(3.2) that 


n 


(3.5) 


lim n[#(Zp„) - HniZpn)] = lim /:r- 

n-*oo n-»oo 4(W — 4)\/27r 


(«!,» - 32p„)e"^‘'" 




since a* = o(l). By (3.4) and (3.5) we have 


lim n J/p f- — 1^ + — 

n-*oo L ^ - 


y\ - 3i/p 

4 


But lim n(l — a) /a = —1, so that 

n~»oo 

Um nan == 

n-^oo 


y\ + Vp 

4 


Hence 



and equation (1,4) follows at once. 


4, Tables. The following tables compare the tnie values of Xp,n and tp,n 
with those obtained from (1.3) and (1.4). The true values [4], [5], (to three 
decimal places) are shown in italics. 


TABLE 1 


2 

Xp,n 


p 

.01 

.05 

.1 

.5 

.9 

10 

23.253 

23.209 

18.318 

18.307 

15.989 

15.987 

1 

9.333 

9.342 

4.875 

4.865 

30 

50.908 

50.892 

43.777 

43.773 

40.257 

40.256 

29.333 

29.336 

20.600 

20.599 

50 

76.163 

76.154 

67.507 

67.505 

63.168 

63.167 

49.333 

49.335 

37.689 

37.689 

100 

135.811 

135.807 

124.343 

124.342 

118.499 

118.498 

99.333 

99.334 

82.358 

82.368 
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TABLE II 


'O.n 


n 

.0125 

.025 

.05 

.125 

.25 

10 

2.679 

2.634 

2.197 

2.228 

1.797 

1.813 

1.212 

1.221 

0.700 

0.700 

30 

2.354 

2.360 

2.039 

2.042 

1.696 

1.697 

1.171 

1.173 

0.683 

0.683 

60 

2.298 

2.299 

2.000 

2.000 

1.670 

1.671 

1.161 

1.162 

0.679 

0.679 

120 

2.270 

2.270 

1.980 

1.980 

1.658 

1.658 

<• 

1.156 

1.166 

0.677 

0.677 
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GENERALIZATION OF POINCARfi’S FORMULA IN THE THEORY 

OF PROBABILITY 


By Kai Lai Chung 
Tsing Hua University, Kunming, China 


L(5t P[m](l, * * * , ^), (0 < m < n) denote the probability of the occurrence of 
exactly m events among the n arbitrary events Ei , • • • , En ; and pm(l, * * * , n) 
(I < m < n) that of at least m. Let < i < n), where (*'i • • • Vi) 

is a combination (without repetition) out of (1, • • • ,n), denote the probability 
of the occurrence of E^^ , * * * , E,,^ (without regard to the other events) ; and 

~ Ij Si — 


where the summation extends to all the combinations with i members out of 
(1, • • • , n). 

Then Poincar^^s formula may be written as follows: 

P[0](1, * • • , n) = X (— 1)*>S» . 

»-o 

An equivalent formula is: 

pi(l, = t (-ir'Si. 


The following conventions concerning the binomial coefficients are made: 



if a < 6 or 6 < 0. 


Two generalizations, possibly due to de Mises, are 

t— m \'^/ 

p-d. ■■■,») 

We notice that the probabilities appearing on the left-hand sides of these 
formulas are symmetrical with respect to the set of suffixes (1, • • • , n), and the 
sums on the right-hand sides are symmetrical in the same way. 

As a natural generalization let us consider a probability which is symmetrical 
with respect to certain sub-sets of (1, • • • , n). We divide the n events into r 
sets: 


Ep 


E, 


>'ln, 


E., 






; Ep^ 


Ep 


where 7ii + 712 + • • • + nr = n , And we ask for the probability that out of the 
first set of 7 ii events exactly mi events occur; and out of the second set of th events 
exactly tuo events occur; and so on; and finally, out of the rth set of nr events 
exactly mr events occur. When this problem is solved the analogous problem 
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in which we replace some of the words ‘^exactly^' by ^^at least'' can also be 
solved. 

We denote the required probability by the left-hand side of the following 
generalized Poincar&s formula: 


• • • Vlm V2n2 ; ‘ * I M 

(1) = Z Z Z 


tl*mi *r“***r 


where 




the summation extending to all those combinations of a's such that for every 
fc = 1, ■ • • , r, (aki • • • akik) is a combination of ik members out of (vki • • • vknk)* 
Proof: Let denote the probability of the occurrence of the events 

, • • • , . and these only out of , • • * , J^n . It is well-known and also 

easily seen that 

n—g 

P«l***ao • * *^61 

6—0 $ 


where for a fixed h the second summation extends to all the combinations 
(i^i • • * ^h) of h members out of the ^‘difference set" (1, • • • , n) — (ai • • • da)- 
Now let each p in each S on the right-hand side of (1) be decomposed into a 
sum of the Pii-i.-.v^'s in the last-written way. Consider a fixed 


P[M11 • • ‘Ml jjMJl • • -Ma/j • • ‘Mrl * • 'Mr j ,.1 > 

• , r, (ma; , • * • , Ukjk) is a combination oijk members out 

O’lVM Hr 

flAw \ir 


“tni *2*^2 


where for every A; = 1, 

of • • • Pknk)- It appears once in exactly 
Sii , 1*2 • • »r • Hence, its total contribution to the right-hand side of (1 ) is 

ni n2 fir 

^ ... ^ jy'l+t2+***+*r-wi-m2 mr 

/ \ / ^2 \ / ir 

\mij \nh) \m, 

_ fr / mA ^ A’fc - wA 

fc-i ymk/ \ hk / 


terms in 




if jk = rtik 
if otherwise 


if .7* = mjtfor every k = 
otherwise. 


,r 
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Therefore after the decompositions and the collecting of terms, the only p's 
remaining on the right-hand side of (1) are those in which for every k = 
1, • • • , i we have == w* . Thus the right-hand side is reduced to 


2p(i 


>11** •MlmjMll • • ‘ ‘Mrl * * > 


where the summation extends to all those combinations of /z^s such that for 
every k = 1, • • • , r, (fiki • • • is a combination of nik members out of 

{vki ' * * nnk)- This is clearly equal to the left-hand side of (1). Q. E. D. 

If we replace ^'exactly ntk' by ‘"at least nik^ in the definition of the probability 
just considered, we replace in our notation the square-bracketed [rrik] by an un- 
bracketed Mk and we replace in our formula by This is proved 


as before, noting that we have 


Z ( 


__ 1 / ik 1 \/ jt\ _ 

- 1/ w 


for Jit = rrih, ,nk\ 


and identity which can be proved by induction on jk . 

A parallel generalization of Poincare's formula is as follows: We ask for the 
probability that either out of the first set exactly nti events occur; or out of the 
second exactly m 2 ; • • • ] or finally, out of the rth set exactly nir . That is, in- 
stead of repeated conjunctions we may consider repeated disjunctions. We 
denote the required probability by the left hand side of (2), then it is given in 
terms of the p's defined above in (1) by the right-hand side below: 


(2) > * * ’ > *'lni *'21 * ’ * *' 2 n 2 J ’ * * J *'rl • * * *'rnr) 


— Pmi ,m2 »• • • i»»r iP»»l +lt»»*2+l.* * *» »»»r+l • 

Other events symmetrical with respect to each of the sub-sets, in whose def- 
inition the words “and", “or", “exactly", “at least" appear arbitrarily, may be 
considered. 

Lastly, we only mention that as a first application the formula (1) can be used 
to establish the formula 

{n — A:)2pm(*'i •••*'*) = (A; + 1 — m)2p„»(j'i • • • Vk+i) + mZpm-\-i{n • • • J'Jt+i), 

first obtained by P. L. Hsu. For its significance we may refer to [1], and a con- 
tinuation of that paper to be published shortly. 
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TABLES FOR TESTING RANDOMNESS OF GROUPING 
IN A SEQUENCE OF ALTERNATIVES 

By Frieda S. Swed and C. Eisenhart 
University of Wisconsin 

When two different kinds of objects are arranged along a line they will form 
two or more distinct groups of like objects. Thus, in the arrangement: aabbbaby 
there are 3 a^s and 4 Vs forming 4 groups. In general, if there are m objects 
of one kind and n objects of another kind, there are in all 

^m + n pm+n 

^ n 

different arrangements possible. There will be no loss of generality if we assume 
that m < n. 

If u is defined to be the number of distinct groups of like objects in any one 
arrangement, then the proportion of arrangements yielding w' or less groups is^ 

(1) P{u<u'\ 

where 

fu = when u = 2A;, i.e. u is even, 

and 

fu = Ckll when w = 2fc — 1, i.e. is odd, 

for./: = 1, 2, • • • , m + 1. For example, if m = /^ = 5, then 


P\u = 2} 
P[u = 3} = 


_ /2 ^ 2{ClCl} _ 1 

A = 8 

Cr 252 


In a random arrangement (1 ) is the probability of < xi\ 

The following tables have been prepared for use in testing data for random- 
ness and for testing whether two samples are from the same population. Table 

1 gives P\u < u') to 7 decimal places for m < n < 20 with a range of m from 

2 to 20 inclusive whereas Table II gives correct values for for 6 = .005, .01, 
.025, .05, .95, .975, .99 and .995, where is the largest integer, w', for which 
P{w < w'} < e when € < .50, and is the smallest integer, u\ for which P{w < u'} 
> 6 when c > .50. This table was obtained from Table I and covers the same 


^ W. L. Stevens, “Distribution of Groups in a Sequence of Alternatives^’ {Annals of 
Eugenics, Vol. IX, Part I (1939) pp. 10-17). 

A. Wald and J. Wolfowitz, “On a Test Whether Two Samples are from the Same Popu- 
lation” {Annals of Math. Slat., Vol. XI, No. 2, June (1940; pp. 147-162). 
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range of values of m and n. Table III gives values of for m = n from 10 to 
100. These values of Ue were obtained by using the normal approximation given 
on page 161 of the Wald-Wolfowitz paper together with a correction for con- 
tinuity not given in their article — this correction improved the approximation 
for small values of m and n. The values of w* for m = w = 10 through 20 are 
included in Table III although they can be obtained from Table II in order to 
check on the adequacy of the approximation. These values obtained with the 
approximation check with those of Table II except for the five underscored 
values. It appears that the approximation will be adeejuate in general for 
m = n > 20. 

To illustrate the use of these tables to test randomness of an arrangement,^ 
consider a case where one might suspect nonrandomness and, more specifically, 
expect too few groups. The arrangement of diseased and healthy plants in a 
row of a field might be such a case. For example, we might have the following 
plant arrangement: 

H H H H H H II H H H D H D D D D II H H II H H H II H, 

where 

m = 5, the number of diseased plants present, 
n = 20, the number of healthy plants present, 
u' = 5, the number of groups actually formed. 

From Table I the probability associated with this arrangement is found to be 
.018,3512, which is the probability of xi < u\ Since P < .05, we might elect 
to regard this as evidence of a tendency for the disease to be nonrandomly 
distributed among the plants in a row, knowing that if we look for an explana- 
tion of ‘^clustering” whenever P{u < u'} < .05 we may expect to follow a false 
scent not more than one time in twenty in the long run. 

When a control chart^ suggests the presence of assignable causes of variation 
in a manufactured product flowing from a production line, an examination of 
various types of runs, e.g. the lengths and relative frecpiency of runs above and 
below the median of a sequence of values, may assist in diagnosing the nature 
of the cause. Dr. Walter A. Shewhart has given us such an instance: A se- 
quence of observations dealing with corrosion suggested the presence of an 
assignable cause of variation. By the use of run charts an assignable cause of 
variation was tracked down in the measuring apparatus and an attempt was 
made to eliminate it. The original sequence examined with regard to runs 
above and below the median of the sequence exhibited an unexpeitedly large 
number of mns of length 7 or more and as a result a significantly low value of 

* W. L. Stevens (ibid). 

® American Defense Emergency Standards Zl.l and Z1.2 entitled “Guide for Quality 
ControP* and “Control Chart Method of Analyzing Data’^ and American War Standard 
Zl.3 entitled “Control Chart Method of Controlling Quality During Production** (pub- 
lished by the American Standards Association, New York City). 
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u, and, if the assignable cause were not completely eliminated in the new design, 
we might expect too large a proportion of long runs above and below the median, 
and, hence, too few total runs. A sequence of 40 observations taken with the 
new measuring device yielded a total of 15 runs above and below the median of 
the sequence which is significantly fewer than would be expected to arise under 
a state of statistical control, since for m = n = 20, P{w < 15} = .038. This 
sequence is of special interest since the occurrence of too few runs suggested 
the assignable cause had not been entirely eliminated although no especially 
long runs, say of length 7 or more, occurred in this sequence, so that from the 
point of view of length of runs without regard to their number the assignable 
cause might have been judged to have been eliminated. 

As an instance where too many groups would be the probable alternative to 
randomness consider the arrangement of occupied and unoccupied seats at a 
lunch counter about half an hour before the popular lunch hour begins. In 
such a case the critical region would be w > w' and the appropriate probability 
would beP = 1 — P{w<w' — 1}. Such a situation was observed and yielded 
the following arrangement of empty and occupied seats along the lunch counter: 

EOEEOEEEOEEEOEOE, 

m = 5, 

n = 11, 

w' = 11, 

P = 1 - .942,3077 = .057,6923; 

and though this probability is not quite significant, the arrangement observed 
has the maximum number of groups of empty and occupied seats for the m and 
n.of the size observed since no two occupied seats are adjacent. However, if 
another customer had entered and sat either in the 5th empty seat from the 
left or in the 8th empty seat, the number of groups would have been increased 
by two and the situation would be: 

m = 6, 

n = 10, 

ti' == 13, 

P = 1 ~ .989,5105 = .010,4895. 

This P value is significant, and for this assumed case, as well as for the actual 
case observed, the arrangement of and O's has the maximum number of 
groups of like objects. Certainly both of these cases exhibit too many groups 
to be considered random arrangements. 

The use of these tables to test whether two samples constitute independent 
random samples from the same population* can be illustrated by using the data 
of Snedecor\s Example 4.11 on page 75 of his Statistical Methods (3d edition) 

<A. Wald and J. Wolfowitz (ibid) have pointed out that exceptionally small values of 
w' are to be regarded as evidence for rejecting this null hypothesis. 
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which gives daily gains in two lots of steer calves on two different rations. The 
daily rates of gain given for the two lots are: 

I. 1.95, 2.17, 2.06, 2.11, 2.24, 2.52, 2.04, 1.95; 

V. 1.82, 1.85, 1.87, 1.74, 2.04, 1.78, 1.76, 1.86. 

Arranging these rates in order of magnitude, designating a calf on ration I by 
italics and one from V by ( ), we have 

(1.74), (1.76), (1.78), (1.82), (1.85), (1.86), (1.87), 1.95, 1.95, 

(2.04), ^.04, 2.06, 2.11, 2.17, 2.24, 2.52. 

Whence 

w = 8, 
n = 8, 
u* = 4, 

P = .008,8578. 

Accordingly, at either the .05 or .01 level of significance rejection of the null 
hypothesis that the two samples constitute independent random samples from 
the same population is indicated. 

For these data we note the fact that having two identical values, i.e. 2.04, in 
the two lots did not alter the number of groups regardless of whether they were 
recorded as (2.04), 2.04 or as 2.04, (2.04). However, such duplications in general 
may be more bothersome, since they may yield different values of depending 
on the order in which they are considered. In such instances both possible 
orders should be considered. 

The merit of this test is that it employs a minimum of assumptions — merely 
that the common population be continuous, and that the samples be drawn at 
random independently. Its principal defect is its lack of power. As a conse- 
quence gross disparity between the samples is generally required to render 
7/' < Ut . Therefore, when additional assumptions are tenable, tests utilizing 
them should be employed. 

Most of the computing and checking of these tables was done by Frieda S. 
Swed, Philip Ritz and Beatrice E. Kelley with some assistance from Jay Grod- 
man, Edward Halamka and Mrs. Henry Wallman. Also, Duane Borst and 
Francis Cox helped with the typing and the proofing of the tables. 



When m = n, the largest possible value of u' is 2m; when m < n, the largest possible value of u' is 2m -j- 1* 
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TABLE I {Concluded) 






3 

u 

5 

6 


,000,0000 

,000,0000 

.000,0000 

.000,0000 

.000,0000 

.000,0000 

.000,0002 

.000,0001 

.000,0019 

.000.0009 

• s 1 

9 


7 

8 

9 

10 

11 

19 

20 

.000,0086 

.000,0050 

.000.0462 

.000,0280 

.000,1875 

.000.1169 

.000,7174 

.000,4611 

.002.2009 

.001,4591 


■ s 19 



12 

13 

14 

15 

16 

19 

,oo6,3»;48 

.015.3550 

.034,8553 

,068,2844 

.125,5915 

20 

,004.3501 

.010,8549 

.025.4705 

.051.5699 

.098,1013 

n Z. ] 

9 


17 

18 

19 

20 

21 

19 

.204,3888 

.312.7350 


.566,8804 

.687,2650 

20 

.164,9901 

.260.9611 

.372.9273 

.503,2583 

.627,0727 

m c : 

^9 

S 

22 

23 

24 

25 

26 

19 

.795,6112 

.874,4085 

.951,7156 

.965.1447 

.934,6450 

20 

.744,3706 

.835,0099 

.904,8070 

.948.4301 

.975,5734 

■ = 19 * 

S 

27 

28 

29 

30 

51 

19 

.993, 64*52 
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.999,2826 

.9^,8125 

•999,9538 

20 , 

.989,1451 

.995,8908 
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.999.8831 


«" = 19 
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32 
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34 

35 36 

19 
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.999.9999 1. 

20 
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■ 3 20 


1^ 

2 
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4 

5 

6 

20 
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.000.0000 
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.000,0005 

■ S 20 


7 

8 

9 

10 

11 

20 
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.000,0165 

.000.0710 
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■ £20 

Bi 

12 

13 

14 

15 

16 

20 

.002,9046 

.007,4821 

.018,1627 

.037.9982 

.074,8356 

■ « 20 

iSI 

17 

18 

19 

20 

21 

3 

.130,0916 

.212,9756 

.314.2784 

.438.0928 

.561.9072 

■ S 20 
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25 
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25 

26 
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.787,0244 

.869,9084 

.925,1644 

.962,0018 
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Kl 
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B = ?0 
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— 00 < V < 00 . Thus neither y nor w has an as 3 m[iptotic normal distribution. 
It is, of course, this fact which makes the criterion of minimum variance illusory. 


3. Other polynomial distribution functions. Let repeated samples of n in- 
dependent values of x be drawn from a population characterized by D{x) = 

fc + 1 jfc 

— x\Q < X < a, and k a positive integer or zero. It can be shown that the 

CL 

(k 4- l)n + 1 

best linear estimate of the mean of the population is t/ = — Xn , 

where as before Xn is the largest item of the sample. The sampling distribution 
of y is easily obtained. It follows that 


2 

Cy 


{k + l)a 


k + 3 


2 

(Tx I 


{k + 2y[(k + l)n^ + 2n] n(k + 1) + 2 
where as usual x is the arithmetic mean of the sample. Again, if we write 
u = , the limit of the distribution of u as n approaches infinity 

is, as before, « < u < 1. 


A NOTE ON TOLERANCE LIMITS 

By Edward Paulson^ 

Columbia University 

Among various statistical problems arising in the process of controlling quality 
in mass production, a rather important one appears to be the determination of 
tolerance limits when the variability of the product is known to be due to ran- 
dom factors. This problem was recently treated in a pioneer article by Wilks. 
This note will point out a relationship between tolerance limits and confidence 
limits (used in the sense of Neyman), and will use this concept to establish 
tolerance limits when the product is described by two qualities, the measure- 
ments on which are assumed to have a bivariate normal distribution. 

For the case of a single variate, the problem of finding tolerance limits as 
stated by Wilks is to find a sample size r, and two functions L\{xi • • • Xn) and 

L 2 {xiX 2 • • • x„) so that if P = / f{x) dx denotes the conditional probability of 

Jli 

a future observation falling between the random variates L 2 and Li , then 

E(P) = a, and Prob. [a — Ai < P < a + A 2 ] ^ 0, 

The relationship between confidence limits and tolerance limits will arise if 
confidence limits are determined, not for a parameter of the distribution, but for 

1 Work done under a grant-in-aid from the Carnegie Corporation of New York. 
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a future random observation (or for some function of the observations in a future 
independent sample). This is based on the following simple lemma: If confidence 
limits Ui(xi • • • Xn) and Ihixi • • • Xn) on a probability level = an are determined for 

/•f/2 

g, a function of a future sample of k observations, and P = 1 dg, then E(P) = 

Jui 

ao . For let ^(g) dg and ip(Ui , U 2 ) dUi dlh denote the distribution of g and Ui , U 2 
respectively, then by the definition of expected value 

^ I « /« [ C ■ 


This triple integral is however exactly the probability that g will lie between 
Ui and U 2 , which by the nature of confidence limits must equal ao , which proves 
the lemma. In a similar manner it follows that if on the basis of a given sample 
an I dimensional confidence region is found for statistics g\ , g^ j • • - gi derived 
from a future sample, and if P denotes the probability that g\ ‘ • gi all fall in 
the confidence region, then E{P) in repeated sampling equals a. To establish 
tolerance limits, it is necessary in addition to E{P) to also know the distribution 
of P, or at least (t%, so the distribution of P can be approximated. 

It appears, at least on an intuitive basis, that the ‘‘best’^ confidence interval 
can be used to determine the shape of the ^^most efficient^^ tolerance limits; this 
intuitive notion will gain additional support from the character of the tolerance 
region which will now be derived for an observation {x, y) from a distribution 
with probability density /(a:, y), where 


y) = 




2*-<r» -y/l — p* 


Suppose we have 2 independent samples 

[(a:i , yi){xi - (xn , y„)] and [(x, y)] 


both from /(x, y). Then it is known that 


_/n \ 1 I(^ ~ 

~ \n + 1/ 1 - r*\\ Sx ) 


~{x - x){y - 3 /) + 

Sz Sy 


(g - y)\ 

} 


where x = 52 Xi/n, s* = 2 (x,- — xf/{n — 1), etc., has the distribution of 
*-1 1 

Hotelling^s Generalized Student Ratio [2]. A confidence region for a future 
observation {x, y) on the basis of a sample of n on a level of significance = a will 
be given by the elliptic region < T\ (in the x, y plane), where T* = 2 (n — 1) 
Fo/(n — 2), where Fo is the value of the F distribution (with ni = 2 and n 2 = 
n — 2 degrees of freedom) which is exceeded with probability = 1 — a. 

If P denotes the probability of a future observation falling in this ellipse, then 

P = j j f{x, y) dx dy. By utilizing the fact [2] that is invariant under linear 
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transformations, it is not difficult to see that the distribution of P will not in- 
volve any unknown parameters, so its distribution can be calculated under the 
assumption m* = = p = 0, (Tx = = 1. Then 

P = F(x, y, Sx , , r) = // :^ ^ e’*"’ dx dy. 


We know that E(P) = a, and we will now calculate the variance of P by ex- 
panding P in a Taylor Series (to terms of the first order) about the point x = 0, 
y = 0, r = 0, Sx = 1, Sj, = 1. P can clearly be put in the form 


-u 


, 'v/ 

pj+r.V— .X 


e-^^'dx 


I 








dy 


Taking derivatives and evaluating about the population values 



2 

CTp — 



Since for ordinaiy values of a (a = .95 or .99) the distribution of P seems to 
approach normality very slowly, we will follow a suggestion of Wilks and sup- 
pose that a fairly close approximation to the distribution of P will be given by 


( 1 ) 


r(u)r(t>) '' ’ 
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where u = [a^(l — a) — aa%]l<T% 

V = [a(l — a)^ — (1 — a)a%]/a%. 

This distribution can now be used to establish tolerance limits. For example, 
it follows from (1) that for a sample size n > 214, and a tolerance region given 
by the ellipse = 9.21, then e(p) = .99 and the Prob.{.985 < P < .995} > 
.992. 

Care must be taken in the use of these and similar results, for if the distribu- 
tion is not a bivariate normal one, a large error may be introduced which will 
not be eliminated with increasing n; however the error will probably be small 
when a tolerance region is found for the means x,y ol b. future sample of k obser- 
vations {k > 20) as contrasted with a tolerance region for a single observation. 
An exact treatment of the case when the bivariate distribution is unknown has 
been given by Wald in the present issue of the Annah of Mathematical Statistics. 

REFERENCES 

[1] S. S. Wilks, termination of sample sizes for setting tolerance limits,’^ Annals of 

Math. Slat., Vol. 12 (1941), pp. 91-96. 
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A NEW APPROXIMATION TO THE LEVELS OF SIGNIFICANCE 
OF THE CHI-SQUARE DISTRIBUTION. 

By Leo A, Aroian 

Hunter College 

Recent articles on the percentage points of the x“ distribution [1], [ 2 ], have 
directed my attention to a method proposed in my investigation of Fisher^s z 
distribution [3], a method particularly useful and easilv computed for n large. 

- n 

In addition, this method avoids interpolation. If < = — 7 ==- , and as = 

V2n 

the measure of skewness for the distribution, the following formulas give sig- 
nificance levels of t as quadratic functions of as , / = a + bas + cal . The values 
of a, by and c were found by the usual method of least squares, fitting each formula 
to the values of t [4] for as = 0, dbO.l, ± 0 . 2 , ±0.3, and ±0.4. Then the value 
of a in each instance was adjusted to give the proper value of t when as == 0 : e.g. 
the constant term by the method of least squares for the 1 per cent point is 
2.32633 which we change to 2.32635. The range | as i g .4 corresponds to n ^ 
50, but the formulas are quite satisfactory for n g 30. Formulas for t when 
I as I > .4 [3] are easily derived, but such results while more accurate in the range 
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i 

53.713 

53.6883 

53.6720 

66.802 

66.776 

66.7659 

79.523 

79.496 

79.4900 

110.313 

110.286 

110.286 

140.193 

140.166 

140.169 

o 

50.914 

50.9015 

50.8922 

63.710 

63.6961 

63.6907 

76.172 

76.1568 

76.1539 

106.408 

106.392 

106.393 

135.820 

135.804 

135.807 

O 

43.767 

43.7754 

43.7729 

55.753 

55.7600 

55.7585 

67.501 

67.5057 

67.5048 

96.2135 

96.2168 

96.2160 

124.340 

124.342 

124.342 

o 

40.246 

40.2559 

40.2560 

51.796 

51.8048 

51.8050 

63.159 

63.1669 

63.1671 

91.055 

91.0611 

91.062 

118.493 

118.498 

118.498 


34.793 

34.7987 

34.7998 

45.610 

45.6155 

45.6160 

56.328 

56.3333 

56.3336 

82.853 

82.8581 

82.8582 

109.137 

109.1416 

109.141 


29.338 

29.3346 
29.3360 

39.337 

39.3346 
39.3354 

49.336 

49.3346 
49.3349 

74.335 

74.3346 
74.3343 

99.335 

99.3346 
99.3341 

r«. 

24.486 

24.4764 

24.4776 

33.668 

33.6597 

33.6603 

42.949 

42.9418 

42.9421 

66.422 

66.4169 

66.4167 

9 a . 138 

90.1336 

90.1332 


20.604 

20.6004 

20.5992 

29.055 

29.0514 

29.0505 

37.693 

37.6894 

37.6886 

59.799 

59.7951 

59.7944 

82.362 

82.3586 

82.3581 


18.491 

18.4960 

18.4926 

26.5080 

26.5114 

26.5093 

34.7634 

34.7656 

34.7642 

56.0538 

56.0546 

56.0540 

77.9294 
77.9296 

77.9295 

s ; 

14.925 

14.9649 

14.9535 

22.139 

22.1703 

22.1643 

29.685 

29.7096 

29.7067 

49.457 

49.4741 

49.4748 

70.049 

70.0620 

70.0648 


13.744 

13.7997 

13.7867 

20.669 

20.7121 

20.7065 

27.957 

27.9920 

27.9907 

47.178 

47.2021 

47.2059 

67.303 

67.3210 

67.3276 


30 
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50 

75 
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1000 * 

67.68 

67.590 

.001 

59.73 

59.683 

.025 

46.9821 

46.9792 


36.2494 

36.250 

.30 

! 

33.5290 

33.530 

o 

31.3144 

31.3183 

.60 

27.4402 

27.4436 : 

1 

o 

25.5064 

25.508 

o 

OC 

23.3631 

23.364 

•A 

S\ 

16.7962 

16.7908 

.999 

11.62 

11.568 

6666 * 

9.33 

9.226 


11 


* First value by (1) or (2). 
Second value correct result. 
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30 ^ n < 50 would be considerably less accurate in the region n ^ 50. After I 
is calculated, x* = ^ + \/2n<. The formulas are: 

(1) <60% = - .16636a3 

<40% = .25335 - .1556703 - .01227603 

<30% = .52440 - .1205803 - .0245410* 

= .67449 - .09061303 - .030693o^ 

<20% = .84162 - .0484330* - . 0367880* 

< 10 % = 1.28155 -1- .10703303 - .04797o^ 

<6% = 1.64485 + .283920* - .04902o3 

<2.6% = 1.95997 + .4722803 - .04304o3 

< 1 % = 2.32635 + .733300* - .024957o^ 

<.6% = 2.5758 + .9360003 - .003770* 

<.i% = 3.0903 + 1.41900* +.0566703 

<,01% = 3.7200 + 2.1260O, + .17449o3 

The maximum error for < in the range | a* | ^ .4, is 2 in the fourth .significant 
figure, 1 in the fourth significant figure, 6 in fifth, 3 in fifth, 3 in fifth, 1 in fifth, 
1 in fifth, 3 in fifth, 4 in fifth, 4 in fifth, 4 in fifth and 4 in fourth significant figures 
respectively for the ,01%, .1%, .5%, 1%, 2.5%, 5%, 10%, 20%, 25%, 30%, 
40%, and 50% points respectively. The error increases outside the indicated 
range. In addition 

(2) < 99 . 99 % = -3.7200 + 2.12600* - .17449o^ 

< 90 ..% = -3.0903 + 1. 419003 - .05667o* 

and similarly for other percentage points. These are obtained from (1) by re- 
placing as by —as and t by — 

We compare results obtained by these methods against those of Wilson and 
Hilferty [2]. In all cases except at the 95% level the method here proposed is 
superior. Table I compares the two methods. It was copied from [2J except 
for the corrections in the Wilson and Hilferty method for the 95% level and in 
the accurate value for x at the 5% level for n = 75, 96.2160 in place of 96. 11. 
Table II gives comparisons for other levels when n = 30. 
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NEWS AND NOTICES 


Readers are invited to submit to the Secretary of the Institute news items of 

general interest 

Personal Items 

Associate Professor H. P. Evans of the Mathematics Department of the Uni- 
versity of Wisconsin has been promoted to a professorship. 

Assistant Professor Willy Feller of the Mathematics Department of Brown 
University has been promoted to an associate professorship. 

Dr. Carl F. Kossack of the Mathematics Department of the University of 
Oregon has been promoted to an assistant professorship. 

Dr. Eugene Lukacs has been appointed to an assistant professorship in the 
Mathematics Department of Illinois College. 

Professor E. B. Mode has been made chairman of the Mathematics Depart- 
ment of Boston University. 

Mr. Charles R. Mummery has been made Product Quality Engineer at the 
Scioto Ordinance Plant of the U. S. Rubber Company. 

Professor H. L. Rietz has retired after twenty-five years of service as Head of 
the Mathematics Department of the University of Iowa. 


The Foundation for the Study of Cycles has announced that a medal will be 
awarded to the individual making the most significant contribution to cycle re- 
search during 1943. Communications should be addressed to: Professor Ells- 
worth Huntington, Yale University, New Haven, Connecticut. 


Obituary 

Professor Edward L. Dodd of the Mathematics Department of the University 
of Texas died on January 9, 1943 at the age of sixty-seven years. He was a 
charter member of the Institute. He was elected as one of the Vice-Presidents of 
the Institute for 1943. His contributions to mathematical statistics consist of 
numerous research papers on probability; on general mean functions of statistical 
variables; and on statistical theory of periodicities. 


Stanford Courses in Statistical Methods of Quality Control 

A novel procedure in adult education, and particularly in statistical education, 
took place last summer at Stanford University, when courses in the Shewhart 
statistical methods of quality control were offered in short intensive courses. 
There were two courses, one on the campus at Stanford University, July 17-26, 
and the other in Los Angeles, September 20-27. The first course covered ten 
full days, and the second eight. Both courses ran eight hours per day, Satur- 
days and Sundays included. 
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The features of the course may be described by the following points: 

1. The courses were short, thus making it possible for men in industry to 
attend. 

2. The number of hours* instruction was sufficient to cover the field adequately. 

3. The instruction covered a wide range of points of view. 

4. The students were picked delegates sent by industry. 

5. The courses are being followed up with monthly meetings in Los Angeles and 
San Francisco. 

Intensive courses of this character were first suggested by Dr. W. Edwards 
Deming in April of 1942, while he was temporarily detailed to the office of the 
Chief of Ordnance in the War Department, and the first course actually com- 
menced just three months later. By giving the course to men already in in- 
dustry, the yield obtained was manyfold higher than can be expected from a 
regular college course. Contributions and reports made by the delegates sub- 
sequent to the courses supply abundant foundation for this statement. 

West Coast industry and the Army and Navy ordnance districts sent 32 dele- 
gates to the West Coast, and 31 to the second. Through the efforts of Professor 
Eugene Grant of Stanford, industry and the Army and Navy were persuaded to 
send some of their most valued officials. The instruction was organized by 
Professor Holbrook Working. Both he and Professor Grant took an active part 
in the instruction, which was supplemented in both courses by Dr. W. Edwards 
Deming as an exponent of government and industrial sampling. In the first 
course, Mr. Charles R. Mummery of the Hoover Company served as an in- 
structor from the viewpoint of industry. In the second course (the one in Los 
Angeles), Mr. Ralph E. Warcham of the General Electric Company occupied the 
industrial corner of the square of instruction. The expense of the instructors 
was paid out of ESMWT funds (Office of Education). Montly follow-up courses 
in San Francisco and Los Angeles, under the direction of Professors Working and 
Grant, supply the necessary power for maintaining momentum, and for gathering 
the men together for directed study and consultation. 

The demand for men trained in this line far exceeds the supply, and there are 
movements afoot to provide similar courses in a number of industrial cities. 
Three-day courses in a dozen or more key ordnance cities were held last fall by 
the Ordnance Department. The lecturers were Messrs. G. D. Edwards and 
Harold F. Dodge of the Bell Telephone Laboratories, and Mr. G. Rupert Gause of 
the Aberdeen Proving Ground, now with the Army Ordnance in Washington. 
These courses and the Stanford courses alleviated the situation considerably, but 
further instruction is needed. 

Junior Membership in the Institute 

At the annual election for 1942 which was held by mail ballot because of the 
postponement of the Annual Meeting, constitutional amendments were approved 
which created a new grade of membership in the Institute, known as Junior 
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Membership. It is hoped that this provision for Junior Membership will 
stimulate interest in mathematical statistics at the advanced undergraduate 
level in colleges and universities. 

The Board of Directors have approved the following rules governing Junior 
Membership: 

1. Any undergraduate student of a collegiate institution is eligible for election 
as a Junior member of the Institute of Mathematical Statistics provided 
that he or she is sponsored by a member of the Institute. 

2. The annual dues ($2.50) must be submited with the application. 

3. Annual membership shall coincide with the calendar year and the Junior 
Member shall receive a complete volume of the Annals of Mathematical 
Statistics for the year in which he or she is elected. 

4. Junior Membership shall be limited to a term of two;y^ears, but a Junior 
Member may apply for transfer to ordinary membership at the beginning 
of his second year. 

For the convenience of any Institute member who may wish to sponsor a 
Junior Member an application blank is provided at the back of this issue of the 
Annals. Additional blanks may be obtained from the Secretary of the Insti- 
tute. 


Announcement of May Meeting in New York 

There will be a joint meeting between the Institute and the American Society 
of Mechanical Engineers on Saturday, May 29, 1943, at the Engineering Societies 
Building, 29 West 39th Street, N. Y. 

The meeting will consist of two sessions on industrial applications of mathe- 
matical statistics. The topics are as follows: 

Morning Session, 10 A.M. 

Chairman: Harold Hotelling 

1. J. Wolfowitz, On the Theory of Runs with some Applications to Quality Control. 

2. Churchill Eisenhart, On the Presentation of Data as Evidence. 

Afternoon Session, 2 P.M. 

Chairman: W. A. Shewhart 

1. H. F. Dodge, A Sampling Inspection Plan for Continuous Production. 

2. L. C. Young, Tolerances and Product Acceptability. 


ANNUAL REPORT OF THE PRESIDENT OF THE INSTITUTE 

Ordinarily at the business meeting and at the luncheon customarily held as 
part of the annual meeting of the Institute, the President has the opportunity to 
make public acknowledgement to those individuals, aside from the officers, who 
served the Institute during the year, and to have his say concerning past progress 
and future plans. This year it appears that the pages of the Annals must be 
used for this purpose. 
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As the result of proposals made and approved at our last regular annual meet- 
ing in New York City, a larger number of special committees than usual were ap- 
pointed for 1942 and thus more members than before were specifically asked to 
participate in the affairs of the Institute. The Institute is much indebted to 
these individuals for the way in which they responded. 

Professors A. T. Craig, Harold Hotelling, and S. S. Wilks, Chairman, consti- 
tuted a committee to study the Board of Directors of the Institute and the 
formal connection between the Institute and its journal, The Annals of Mathe- 
matical Statistics, Their recommendations were incorporated in amendments 
to the constitution and by-laws recently approved by the Institute. The Board 
was increased in size and given greater continuity by including in it the two 
previous presidents, and the editor of the Annals, ex officio, and by increasing 
the term of the Secretary-Treasurer to three years. 

The new class of junior memberships is the result of a study of this question by 
a committee composed of Professors J. H. Bushey, Boyd Harshbarger, and G. W. 
Snedecor, Chairman. Regulations, since approved by the Board, under which 
local chapters of the Institute may be formed, were drawn up by a committee 
consisting of Dr. C. F. Kossack, Professor H. D. Larsen, and Professor B. H. 
Camp, Chairman. 

Dr. L. A. Aroian, Dr. J. F. Daly, Mr. H. F. Dodge, and Professor W. D. Baton, 
Chairman, as a committee agreed to assist the Institute by endeavoring to bring 
the Annals of Mathematical Statistics to the favorable attention of municipal, 
industrial, and college libraries which had not been subscribers to it. As a result 
of their fine work, a good number of domestic libraries has been added to our sub- 
scription list, thus serving to counterbalance our losses abroad. 

The Program Committees for the year consisted of Professor Churchill 
Eisenhart and Mr. E. C. Molina, Chairman, for the September meeting in Pough- 
keepsie, New York, and of Professor P. S. Dwyer and Dr. W. E. Deming, chair- 
man, for the projected Cleveland meeting. The Institute is always much in- 
debted to those who do the work of arranging its programs for meetings, but this 
year we owe Dr. Deming a special acknowledgement, who prepared an excellent 
program for Cleveland, then one for a New York meeting under extremely short 
notice when the Cleveland meeting was cancelled, and then had that meeting also 
cancelled. Dr. W. R. Van Voorhis acted as our representative on the Committee 
on Local Arrangements for the meeting planned for Cleveland. 

The membership committee appointed for 1942 was made up of Dr. W. E. 
Deming, Professor E. L. Dodd, and Professor A. T. Craig, chairman. After 
Professor Craig took up his commission in the Navy, Professor B. H. Camp 
agreed to take his place on this committee. 

For some years Professor A. T. Craig has generously acted as custodian of our 
files of back numbers of the Anneds. This service to the Institute has been 
taken over by Professor L. A. Knowler, who has been of much assistance, ^r. 
W. E. Blanche did a considerable amount of work in connection with finding 
advertisers for the Annals. 
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Our annual meetings had been increasingly, successful in recent years and it 
was a real sacrifice for the Institute to forego the one planned for 1942. Though 
the war is demonstrating in still more ways and places the importance of sound 
statistical methods, for the present it imposes serious responsibilities on the 
friends of the Institute. A reading of the report of our faithful and efficient 
Secretary-Treasurer will amplify this statement. Of the present Board, Pro- 
fessors Olds, Wilks and Craig met in Pittsburgh January 23 and 24 to consider 
some of our problems. Though there seems no prospect of a national meeting in 
the coming year, it is hoped that some local meetings can be held and that in 
other ways we can keep up the activities of the Institute. In particular there 
exists the opportunity of organizing local chapters of the Institute in the larger 
centers which would be particularly valuable now. In industrial areas we may 
contribute to the war effort as well as promote an important aspect of mathe- 
matical statistics by endeavoring to be useful in the development and application 
of industrial statistics. It is clear that the Institute needs the loyal support of 
its membership now as much as ever before if it is to fulfill the functions for which 
it was founded. 

Cecil C. Craig, 

PresidenL 

December 31, 1942. 


ANNUAL REPORT OF THE SECRETARY-TREASURER OF THE 

INSTITUTE 

On September 8-9 the Institute met at Vassar College in conjunction with the 
American Mathematical Society and the Mathematical Association of America. 
Mr.’E. C. Molina and Professor Churchill Eisenhart were in charge of the pro- 
gram. Fifty eight members of the Institute attended the meeting. The Annual 
Meeting, originally scheduled for Cleveland then transferred to New York City, 
was finally postponed at the request of the Office of Defense Transportation. At 
the present time it seems that this meeting will have to be abandoned entirely 
and the Institute must be content with holding local meetings in some of the 
larger cities. 

Because of the postponement of the Annual Meeting, the annual election was 
held by mail. The following officers were elected: Professor Cecil C. Craig, 
President; Professors Edward L. Dodd and Abraham Wald, Vice-Presidents; 
and Professor Edwin G. Olds, Secretary-Treasurer. Nine amendments to the 
Constitution and six amendments to the By-Laws were proposed and accepted by 
a two-thirds majority of those voting. Professor fC. L. Fetters acted as teller. 

During the past year the Secretary has cooperated with industrial concerns and 
government agencies in locating statistically trained personnel to fill positions 
c^j^ated by the emergency. Members of the Institute are requested to keep the 
Secretary informed regarding the availability of such personnel. 

The death of one member of the Institute has been reported since the last 
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annual meeting — Dr. Robert Henderson, former Vice-President and Actuary of 
the Equitable Life Assurance Society. 

The following financial statement covers the period from December 10, 1941 to 
December 10, 1942 (the books and records of the Treasurer have been audited by 
Mr. George E. Niver and found to be in agreement with the statement as sub- 
mitted) : 

FINANCIAL STATEMENT 
December 10, 1941, to December 10, 1942 
Receipts 


Balance on Hand, December 10, 1941 $1,561.54 

Dues 2,293.15 

Subscriptions 1,289.74 

Sales of Back Numbers 1,393.65 

Cumulative Index 5.00 

Miscellaneous 28.50 

Total Receipts $6,571.58 


Expenditures 

Annals Office 

Editorial Expenses 

Waverly Press 

Printing and Mailing Annals — 4 issues 

Back Numbers Office 

Purchase of back numbers from H. C. Carver 
Reprinting 300 copies of Vol. V, No. 4 

Library Committee 

Secretary-Treasurer's Office 

Printing and Supplies 

Binding 

Postage 

Clerical Help 

Printing Programs for Meetings 

Miscellaneous 


$127.98 

3,227.34 

$355.77 

142.16 

$497.93 

16.77 

$53.58 

30.00 

139.16 
215.50 

$438.24 

101.11 

7.08 


Total Expenditures $4,416.45 

Balance on Hand, December 10, 1942 2,155.13 


$6,571.58 

In comparison with the financial condition of the Institute at the end of 1941, 
the receipts from dues, subscriptions, and sales of back numbers have increased 
more than $800. This is mostly due to a large increase in the sales of back num- 
bers and a net increase of fifty members. The increase in expenditures of the 
Institute was accounted for by the increased cost in printing the Annals. This 
marks the beginning of a trend which seems likely to continue throughout the war. 
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It would seem over-optimistic to expect that the financial situation of the 
Institute would continue to show marked improvement in 1943. Present indi- 
cations suggest that we shall be very fortunate to avoid a considerable deficit in 
•operations. A large number of our foreign subscribers have not renewed and we 
face considerable difficulty in delivery of the Annals to those still in force. The 
large increase in the sales of back numbers was due to a rather successful effort to 
persuade domestic libraries to provide themselves with complete sets of back 
numbers while the issues were still available. The Institute faces an increase in 
operating expenses and an advance in the cost of producing the Annals. The 
full cooperation of all members is needed if we are to avoid a decrease in the work 
of the Institute during 1943. 

Edwin G. Olds, 
Secretary- Treasurer. 

December 31, 1942. 

On behalf of the Board of Directors of the Institute, I regret to announce the 
sudden death of Vice-President E. L. Dodd, on January 9, 1943, shortly after 
this report was written. Dr. W. E. Doming was appointed by the Board of 
Directors to fill the vacancy created by Vice-President Dodd’s death. 

E. G. O. 

CONSTITUTION 

OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Purpose 

1. This organization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Junior Meml^ers, Fellows, 
Honorary Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others, Junior 
Members excepted, who have been memljers for twenty-three months prior to the date of 
voting. 

3. No person shall be a Junior Member of the Institute for more than a limited term as 
determined by the Committee on Membership and approved by the Board of Directors. 

ARTICLE III 

Officers, Board of Directors, and Committee on Membership 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre- 
tary-Treasurer. The terms of office of the President and Vice-Presidents shall be one year 
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and that of the Secretary-Treasurer three years. Elections shall be by majority ballots at 
Annual Meetings of the Institute. Voting may be in j)erson or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the in- 
dividuals present at the organization meeting, and shall serve until December 31, 1936. 

2. The Board of Directors of the Institute shall consist of the Officers, the two previous 
Presidents, and the Editor of the Official Journal of the Institute. 

3. The Institute shall have a Committee on Membership comix)sed of three Fellows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of Di- 
rectors shall elect three members as Fellows to serve as the Committee on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect from 
among the Fellows one member annually at their first meeting after their election for a 
term of three years. The president shall designate one of the Vice-Presidents as Chairman 
of this Committee. 

ARTICLE IV 
Meetings 

1. A meeting for the presentation an<l discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 
time as the Board of Directors may designate. Additional meetings may be called from 
time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten- Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the date 
set for the meeting. All meetings except executive sessions shall be open to the public. 
Only papers accepted by a Program Committee appointed by the President may be pre- 
sented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board may 
l>e held from time to time at the call of the President or any two members of the Board. 
Notice of each meeting of the Board, other than the two regular meetings, together with a 
statement of the business to be brought l>efore the meeting, must be given to the members 
of the Board by the Secretary-Treasurer at least five days prior to the date set therefor. 
Should other business be passed upon, any member of the Board shall have the right to 
reopen the (piestion at the next meeting. 

3. The Committee on Membership shall hold a meeting inunediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the' members of the Committee by the 
Secretary-Treasurer at least five days before the date set therefor. Should other business 
be passed upon, any member of the Committee shall have the right to reopen the question 
at the next meeting. 

4. At a regularly convened meeting of the Board of Directors, four members shall con- 
stitute a quorum. At a regularly convened meeting of the Committee on Membership, 
two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The AnndU of Mathematical StaiUtica shall be the Official Journal for the Institute. 
The Editor of the Annala of Mathematical Statiatica shall be a Fellow appointed by the 
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Board of Directors of the Institute. The term of office of the Editor may be terminated at 
the discretion of the Board of Directors. 

2. Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 
Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regularly 
convened meeting of the Institute provided notice of such proposed amendment shall have 
been sent to each voting member by the Secretary-Treasurer at least thirty days before the 
date of the meeting at which the proposal is to be acted upon. Voting may be in person or 
by mail. 


BY-LAWS 

ARTICLE I 

Duties of the Officers, the Editor, Board of Directors, and Committee on Mem- 
bership 

1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, shall 
preside at the meetings of the Institute and of the Board of Directors. At meetings of the 
Institute, the presiding officer shall vote only in the case of a tie, but at meetings of the 
Boahl of Direct<jrs he may vote in all cases. At least three months before the date of the 
annual meeting, the President shall appoint a Nominating Committee of three members. 
It shall be the duty of the Nominating Committee to make nominations for Officers to be 
elected at the annual meeting and the Secretary-Treasurer shall notify all voting members 
at least thirty days before the annual meeting. Additional nominations may be sub- 
mitted in writing, if signed by at least ten Fellows of the Institute, up to the time of the 
meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings at 
the meetings of the Institute and of the Board of Directors, send out calls for said meetings 
and, with the approval of the Presid^t and the Board, carry on the correspondence of the 
Institute. Subject to the direction of the Board, he shall have charge of the archives and 
other tangible and intangible property of the Institute, and once a year he shall publish in 
the Annals of Mathematical Statistics a classified list of all Members and Fellows of the 
Institute. He shall send out calls for annual dues and acknowledge receipt of same; pay 
all bills approved by the President for expenditures authorized by the Board or the Insti- 
tute ; keep a detailed account of all receipts and expenditures, prepare a financial statement 
at the end of each year and present an abstract of the same at the annual meeting of the 
Institute after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. Subject to the direction of the Board, the Editor shall be charged with the responsi- 
bility for all editorial matters concerning the editing of the Annals of Mathematical Star 
tistics. He shall, i^dth the advice and consent of the Board, appoint an Editorial Commit- 
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tee of not less than twelve members to co-operate with liirn; four for a period of five years, 
four for a period of three years, and the remaining members for a period of two years, ap- 
pointments to be made annually as needed. All appointments to the Editorial Com- 
mittee shall terminate with the appointment of a new Editor. The Editor shall serve as 
editorial adviser in the publication of all scientific monographs and pamphlets authorized 
by the Board. 

4. The Board of IDirectors shall have charge of the funds and of the affairs of the In- 
stitute, with the exception of those affairs specifically assigned to the President or to the 
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ON TRANSFORMATIONS USED IN THE ANALYSIS OF VARIANCE 


By J. H. Curtiss 
Cornell University 

1. Introduction. Transformations of variates to render their distributions 
more tractable in various ways have long been used in statistics [12, chapter 
XVI]. The present extensive use of the analysis of variance, particularly as 
applied to data derived from designs such as randomized blocks and Latin 
squares, has placed new emphasis on the usefulness of such transformations. 
In the more usual significance tests associated with the analysis of variance, it 
is assumed a priori that the plot yields are statistically independent normally 
distributed variates which all have the same variance, but which have possibly 
different means. The hypotheses to be tested are then concerned with relations 
among these means. But in practice, it sometimes seems appropriate to specify 
for each variate a distribution in which the variance depends functionally upon 
the mean; moreover, in such cases, the specification is generally not normal. 
For example, when the data is in the form of a series of counts or percentages, a 
Poisson exponential or binomial specification may seem in order, and the vari- 
ance of either of these distributions is functionally related to the mean of the 
distribution. Before applying the u.suai normal theory to such data, it is 
clearly desirable to transform each variate so that normality and a stable vari- 
ance are achieved as nearly as possible. 

Various transformations have been devised to do this, and a number of articles 
explaining the nature and use of these transformations have recently been 
published.* However, the available literature on the subject appears to hen 
mainly descriptive and non-mathematical. The object of this paper is to pro- 
vide a general mathematical theory (sections 2 and 3) for certain types of trans- 
formations now in use. In the framework of this theory we shall discuss in 
particular the square root and inverse sine transformations (section 4), and also 
several logarithmic transformations (section 4 and section 6). 

2. General theory. As it arises in the analysis of variance, the problem of 
stabilizing a variance functionally related to a mean may be stated as follows; 
Suppose X is a variate whose mean y = E(X) is a real variable with a range S of 
possible values, and whose standard deviation a = <rx — <r(ji) is a function of y 
not identically constant. Required, to find a function T = /(X) such that 
l?.ith/(X) and = J?{[r — E(T)f] are functionally independent of y for y 

By “functionally independent,” we mean that » 0, and ** 0 

dn dn 

for It on S. 


‘ See lefennoee fl}, PI, PJ, [41, PI, [61, [181, [161. 
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The following line of argument is adopted in certfw of the references men- 
tioned above ([1], [2], [3], [4]): From the relation dT — f'(X)dX, we deduce as 
an approximation by some sort of summation process that ar » f'(n)<r(ji). 
Setting this expression equal to a constant, say c, we obtain /'(m) =* c/c(jn), 
BO fix) is an indefinite integral of c/aix). The roughness of the approximation 
used here is only too apparent/ For example, if X is normally distributed, then 
the variance of T » X^ as given by the approximation is 4vV> while actually 
it is 4<r*/»* + 2«r^. 

Indeed, it is easily seen that in important special cases the problem of sta- 
bilization as above stated could have no solution other than the trivial one in 
which T is identically constant on the set of points of increase of the d.f.* of X, 
For instance, if X has a Poisson exponential distribution, then the identity 
EWSiX) - E\fiX)]]*] » c, or E{{f(X)f) ® c + jF[/(X)J}*, becomes 

M # 

Expanding both sides in powers of n, we need only equate the coefficients of the 
zero-th power of ft on each side to find that (/(O)]* = c -|- [/(O)]*, which implies 
that c = 0 and hence that/(0) = /(I) »= /(2) = • • • . A similar demonstration 
can be given for the case in which X has a binomial distribution with a fixed 
number of values of the variate. 

As to the problem of choosing T = /(X) so that its distribution is exactly 
normal, we can observe immediately that a single-valued function /(X) will 
never transform a variate X with a discrete distribution into a variate with a 
continuous one. On the other hand, any variate X with a continuous d.f. 
Fix) can be transformed into a normally distributed variate T by the transforma- 
tion T = /(X) defined by "the equation 

However, itside from the practical difficulty of solving this equation for r,the 
resulting function T = f{X) will not generally be functionally independent of 
the mean of X. 

These considerations lead us to seek asymptotic solutions to the problems of 
normalization and stabilization. Such solutions are considered in the next 
section. 

8. Aqrmptotic theorems. In the renminder of this paper, we shall suppose 
that the distribution of X depends on a parameter n which is to tend somehow tq 

* Tippett [14] says: 'This derivation is not mathematically sound, and the result is only 
justified if on application it is found to be satisfactory.’’ 

* i.e., distribution function. For any given one-dimensional variate X we shall denote 
the probability or relative frequency assigned to a set A by P(A). The d.f. of the variate 
then is the point function F(s) P(X ^ x). This function is sometimes called the cumula- 
tive frequency function of X. 
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infinity. The mean ^ = Mn of X, with range Sn , will in general depend upon n 
(although by this we do not mean to exclude the case in which is constant for 
all values of n), and perhaps will depend also on some further independent 
parameters, which we shall denote collectively by with range 2. We shall 
seek a variate T = /(X), in which /(X) is functionally independent of /x and of 
the parameters $ for n on Sn t 6 on 2, and such that the distribution of /(X) — 
/(/Xn) tends as n 00 to a normal distribution, while limn-*oo 0 ^r = c*, where is 
an absolute constant. It is implied here that in case the additional parameters 
0 are present, the function /(X) may depend non-trivially on n; but if n is the 
only parameter on which the distribution of X depends, then /(X) must be 
functionally independent of n. 

A solution to the problem just proposed is given in certain cases by Theorems 
3.1 and 3.2 below, which are suggested by the heuristic reasoning of the second 
paragraph of section 2. 

Theorem 3.1. Let \l/n{x) he a non-negative function of z and n, defined almost 
everywhere and integrable^ with respect to x over any finite interval of the z-axis for 
each n > 0. Let 


T=f(X) = f^Ux)dx, 

where a is an arbitrary constant Let Fn{y) be the d.f, of the variate Y = 
(X — AXn)^n(Mn). Supposc further that a continuous d.f. F{y) exists such that 
Wmn^t^Fniy) = F(y) for all values of y. Then either one of the following two con- 
ditions is a sufficient co7idition for the d.f. Hn(w) of the variate W = /(X) — f{un) 
to tend uniformly to F{w), — « < w < « ; 

(a) To each w for which 0 < F{w) < 1, there corresponds for all n sufficiently 
large at least one root x Xntothe equation 

(3.1) f ^n(u) du = Wy 

•'Mii 

and this root Xn has the property that 

(3.2) limn-i*ao(^n Mn)^n(Mn) ~ W?* 

(b) For all n sufficiently large, y^nipn) > 0, and l\mn^^qn{w) = 1 uniformly in 
any closed finite subinterval of the open interval defined by 0 < F{w) < 1, where 


(3.3) 




To prove this theorem we shall first suppose that condition (a) is satisfied. 
Let and v>» be the end points of the open interval (possibly infinite) defined by 
0 < F(w) <1. If w lies in this interval, and if n is large enough for the root 

Xn in (3.1) to exist, then from the monotonic character of / ^.(x) dx we can 

■'i*. 


* “Integnble” ben means absoltttely integrable in the sense of Lebesgue. 
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infer that 

Hn{w) - P[/(X) - /(#«,) ^ “ ■P[£ 

(3.4) = P(X ^ - P[y g (*, - ^n)Ul^)] 

= F«[(a;n — 

Since F(w) is continuous, limn-ooi^n(ty) = F(w) uniformly on any finite or in- 
finite interval of values of i/?, as is well known/ Therefore limn-aeof^nCt^n) = 
F{w) if limn-»ooU^n = V). Thus from (3.2) and (3.4), we find that limn-aooHn(ti?) == 
F{w) for Wi<w <W 2 . 

If w' ^ wiy and Wi < w” < W 2 , then 0 ^ Hn(w') ^ Hn(w") = F{w") + 
[Hn(w'') — F(w'')]. We can make the right hand member of this relation less 
than any given positive number e by first choosing w" so that F(w") < Je (it 
will be remembered that F(w) is a continuous d.f., and F{wi) = 0) and then 
choosing n so large that the quantity in square brackets is also less than in 
absolute value. Thus limn^^Hn{w') = 0. Similarly if ti;' ^ , we can show 

that limn-»«o//n(ty') = 1. Hence limn-aoW^n(t^) = F(w) for all w, and it follows 
that the limit is uniform on any finite or infinite interval of values of w. 

We shall now show that condition (a) in the theorem is a consequence of con- 
dition (b) . The result follows at once from the following simple lemma i 
Lemma. If yn{w) is a nonruegative function integrable over any finite interval 
of values of w; and if limn^^yn(w) = 1 uniformly in any finite closed subinterval of 
an interval Wi < w < iV 2 , then for every value of w in this interval there exists for all 

n sufficiently large a solution y = yn of the equation yn(z) dz = w, and the solu^- 

tion yn has the property that limn-»«2/ii = ti?. 

For it is clear that if w satisfies the inequality Wi < w < wt, and if ij > 0 
be chosen so that wi<w-‘fi<w + fi<w 2 , then for all n sufficiently large, 

jf yniz) dz ^ w g J yn(z) dz. 

Thus for each n sufficiently large, there exists a root yn of the equation 

yn{z) dz — Wy and furthermore, this root satisfies the inequality u; — 17 ^ 

yn ^ u) + fi. Since 17 is arbitrarily small, the proof of the lemma is complete. 

To apply the lemma, we make the change of variables ^ = (w — fin)4^n(iin) 
in the integral in (3.1), which reduces it to the form 

(3.6) Qniz) dz, y » (x - m.)^»(i»»),' 

and the conclusion that (a) is implied by (b) now follows at once. 

* See (7], Theorem 11, pp. 29-30; also [8]. 
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We add the remark that the uniformity of the limit of qn{z) in condition (b) 
may be replaced by the condition that for each closed finite sub-interval there 
exists a function q(w) which dominates qn{w) for all n sufficiently large. 

Our second theorem, which is stated in the terminology and notation of 
Theorem 3.1, is concerned with the limit of the variance of T = /(X). From 
the mere fact that the distribution of W tends to a limiting form, it by no means 
follows that the mean and variance of the distribution of W approach those of 
the limiting form, as may be shown by trivial examples. Thus additional 
hypotheses on yl^n{x) and on the behavior of the distribution of Y become nec- 
essary. 

Theorem 3.2. Lei T (or f{X))^ F, Fn(y) and F(y) be defined as in Theorem 
S.l, Let the mean and variance of the distribution defined by F(y) exist and have 
respective values 0 and c. Then the follomng three conditions^ taken together, are 
sufficient that 

(3.6) \{mn^^[E(T) - /(/in)] = 0, 

(3.7) limn-*«8<rr = 


(i) E(Y^) exists for n > 0, and limn-«o£^(F^) = 

(ii) Condition (b) of Theorem 3,1 holds, 

(iii) /(F[^n(/in)]‘‘* + /in) - /(/in) = 0 | F | Uniformly m n as | F | oo . 

As a preliminary step in the proof, we observe that (i) and the relations 


= F(,y), c® 



dF(y)y imply that the improper integral 


y^ dFn(y) converges uniformly in n forn > 0. 

eo 


As the integrand is positive, 


the following result is equivalent to the uniform convergence of the integral: 
For every € > 0, there exist numbers Ai and A 2 , Ai < An , such that for all n suffi- 
ciently large, 



y^dFniy) < 6 . 


To prove this, we write 

{C 0 

+ dFiy) - y* d-f-Cy) j + y* j . 

We first choose .4i and At so that the last bracket here is less than in absolute 
value. By condition (i), the first bracket approaches zero as n tends to infinity, 
and the Helly-Bray theorem [10, p. 15] states that the second bracket also ap- 
proaches zero as n tends to infinity, so for all n sufficiently large, the sum of the 
first two brackets is in absolute value less than ^6. 

It is important to notice that we can always choo.se Ai and At in the above 
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demonstration so that Ai > wi , At < lOt , where Wi and m are as usual the 
endpoints of the interval defined by 0 < F(w) < 1. 

To continue with the proof of the theorem, we remark that by a change of 
variables similar to the one used to derive (3.5), the function W =» /(X) — 
may be expressed as a function of Y in the following manner: 



where q^iw) is given by (3.3). In terms of W, (3.6) and (3.7) become, respec- 
tively, 

(3.8) Urn E(W) = 0, 

fl-*00 

(3.9) lim {E(W^) - [£;(IF)]*} = c\ 

n-*oo 

and these are the equations which we now establish. 

Conditions (ii) and (iii) obviously imply that limn_«,Qn(y) = y uniformly in 
any finite closed subinterval of the interval Wi <y <wt, and that a constant M 
exists such that | Qn(y)| ^ M \ y\ tot all n. If J5(F*) exists, so will £(F). 
Now 



Qn{y)dFM 


-U>0 [Q;(y) - y\ dFM + [Q»(y) - y\ dFM, 

where wi < Ai < At < Wt . Therefore 


I E{W) 1 ^ (£' + /*) (M -bl) I y I dFM + Q 1 0n(y) - y I dF.(y). 

From the uniform convergence of y*dF,(y), proved above, we can conclude 

that the pair of improper integrals in this inequality can be made less than an 
arbitrary |c > 0 by proper choice of Ai and At . The third integral approaches 
zero, by the general Helly-Bray Theorem [10, p. 16], and so becomes less than 
for all n sufficiently large. Thus we have established (3.8). To show that 

(3.9) is true, we have merely to prove that lim,,_«F(IF*) = c*. Since F(F*) = 

y*dF,(y), we may write 
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The integral may be shown to approach zero by the argument used in the case of 
E{W), and the required result then follows from condition (i) of the theorem. 
The proof is now complete. 

The sufficient conditions in Theorem 3.2 can be modihed in various more or 
less obvious ways. The existence of the limiting d.f. F{y) was essentially used 

M -fflO 

in the proof only to secure the uniform convergence of / y^dF^iy). Condition 

(ii) can again be modified along the lines suggested at the end of the proof of 
Theorem 3.1. Condition (iii) was used only to secure the uniform convergence 

of the integral / [Q«(y)]*dF„(y). 

•^00 

For later reference, we shall supplement Theorems 3.1 and 3.2 with the follow- 
ing simple result, which is practically self-evident. 

Theorem 3.3. Lei the distributum of a variate Y depend upon a parameter n, 
let Fn{y) be the d.f. of Y, and let Fiy) be a continuous d.f. with the property that 
lim«_,F,(y) = F{y). Let an be a function of n such that limn-.«an — a 7^ 0. 
Then the d.f. of the variate Z = a«F tends as n » to the d.f. F(z/a) if a > 0, 
and to the d.f. 1 — F{z/a) if a < 0. If the variance of Y exists and tends to c* 
asn-* 00 , then the variance of 0*7 tends to oV asn—* ». 

If F(y) is the d.f. of a reduced normal distribution, i.e., 


F(y) * 


V 2 ir h-m 


dt, 


then F(z/a) is also the d.f. of a normal distribution with mean zero and variance 
a*. More generally, any affine transformation of a normal variate yields 
another normal variate. 


4. Applications. The theorems of the preceding section have the effect of 
referring the properties of the distribution of the transformation T = f{X) of 
Theorem 3.1 back to those of the distribution of a related variate Y. In the 
applications given in the present section, we shall let ifniitn) be proportional to 
the reciprocal of the standard deviation of X. The theorems of section 3 state 
in this case that if the reduced, or standardized, distribution of X approaches a 
li miting form, then under certain circumstances, the distribution of f(X) — 
f{pn) will approach a similar limiting form, and a' will approach a quantity 
independent at least of n. In the applications considered here, the reduced dis- 
tribution of X will always approach the reduced normal distribution. 


(I) The square root transformation for a variate witii a Poisson exponential 
distributfon. Let X have a Poisson exponential distribution with parameter n. 
If a is an arbitrary constant, and if 




(VT+~a, 

I 0 


(4.1) 


T 


X^ -a 
X < -a 



114 


J. «. CURTISS 


then the di^nbution of T — y/ n + a tends as n —*<*> to a normal distribtUion 
which has mean zero and variance J, and lim..wrr — i- For = n, <rx = “s/n, 
and it is well known* that the distribution of the reduced variate {X — n)ly/n 
tends to the reduced normal distribution as n — » <» . By Theorem 3.3, the dis- 
tribution of the variate 

y_ X — n _1 / n X — n 

2Vr+l~2'V n + 

will tend to normality as n — » <» , and the variance of Y will tend to the value J, 
which is also the variance df the limiting distribution. Setting 


'P^ix) =» 


we obtain from T = f(X) = / ^nix)dx the formula given in (4.1). To prove 


1 0 , 


X > —a 
® ^ — a, 


the statement in italics, we must show that conditions (ii) and (iii) of Theorem 
3.2 are satisfied. We have, assuming n > —a, 



so clearly (ii) is satisfied. wAlso, 

w - mpMr + /i.) - /(m-) 


y/iY'^n -|- flt "1“ n "b o y/n 4- «, 
-y/n + a, 


w > - jVn + a 
v> ^ -iVn + a, 


Y > — J\/n + a 
F ^ — i\/n -f- a, 


from which it follows at once that | W | < 2 | K | for all F, and so (iii) is satisfied. 

The degree of approximation involved in the equation limn.^MVr = i has been 
investigated numerically by Bartlett [1] for values of n from .5 to 15.0 in the 
cases o •» 0 and a = f. He found that the variance of y/X + (i) is consider- 
ably closer to the limit (i) for 1 ^ n 10 than f s the va riance of Vlf • At 
n = 16, the variance of VX is .256, and that of y/X -|- (i) is .248. 

The question of the degree of convergence to normality and of the possibility 
of selectin g an optimum value of a remain open. By expanding the function 
\/X + a in a' Taylor series about X n with remainder in the form due to 
Schldmikh, it is possible to derive as accurate an estimate of | v* ~ (i)| as may 


* See (e.g.) (9]. 
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be desired. A rough result easily obtainable by this method is that | trr — (i) | ^ 
3/(4n), n > 0. 

(II) The square root transformation for a variate with a r distribution. 

Let X have a distribution whose density function is of the following type: 


(4.2) 


<p{x) = 


0 

Kx^n-l-^r 


If a is an arbitrary constant, and if 
(4.3) T = f(X) 


\/X + a, 
0 , 


X ^ 0 

X ^ 0, > 0. 


X ^ -a 
X < -a, 


then the distribution of T — -s/ (n/2h) + a tends asn—*»toa normal distribu- 
tion which has mean zero and variance 1/4A, and limn-.«,vr = 1/(4A). For Mn = 
n/(2h), Ox = \/n/{h\/2) = ■'n/pnlh. The distribution of the reduced variate 
tends to normality asn —* oo / so that of the variate 


„ _ X - Mn 1 / n X - Mn 

2\/m» + a ‘hy tiX + 2A*o y/ Unlh 


tends to normality also with limiting variance l/(4/i). Setting 


^»(*) = 


1 

ly/ X + a ’ 


X > — a 
X ^ — o, 


we obtain T in (4.3) from the relation ^ = j[_ 'hn{x)dx. The work of verifying 

that the conditions of Theorem 3.2 are satisfied is the same as in the case of the 
Poisson exponential distribution treated above, and will not be repeated. 

For example, if s* denotes the variance of a random sample of n + 1 observa* 
tions drawn from a normal parent distribution with variance a^, then it is well 
known that (n + l)s* is distributed according to (4.2) with h — l/(2ff*). We 
th us can deduce the further facts, also well known, that the distribu tion of 
Vn + 1 s — ay/n tends to normality, and that the variance of sy/n H- 1 
approaches the limiting value \o^. If n is an integer and A = i, the distribution 
defined by (4.2) is called a x* distribution with n degrees of freedom, and the 
variate is often denoted by x** Our conclusion in this case is that the distribu* 
tion of y/2)^ — y/^ tends to a nor mal one w ith zero mean and unit variance. 
From this result and the fact that Vz n— 1 — -n/Su = 0(n“*), it follows im- 
mediately that '\/2x‘ y/2n — 1 has the same limiting distribution as 
y/2^ — y/ 2n. This result,* due to Fisher, is familiar to all users of his table of 
the probability levels of x*. 


*See(e.g.)[0]. 

* For a diwuasion of the degree of ooovergenoe involved here, see [9]. 
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(III) The inverse sine transformation for a binomial variate. Let X have 
a binomial relative frequency distribution with parameter p and the n values 0, 1/n, 
2/n, • • • , n/n. If a is an arbitrary constant^ and if 


(4.4) 


T = f{X) - 


y/nsm ^ X + 


a 

n 


g X ^ 1 - - 
n n 


0, X < X > 1 - - , 
n n 

where T is measured in radians, then the distribution ofT-- y/n sin”"* \/p + (a/n) 
tends asn oo to a normal distribution which has mean zero and variance J, and 
= i. For here, /in = p, and <r\ = pq/n, where g = 1 — p; and the 
familiar DeMoivre-Laplace theorem states that the distribution of the reduced 
variate \/n(X — p)/\/^ will tend to normality as n — ► oo. Hence by 
Theorem 3.3 the distribution of 

Vn(X - p) 

(4.5) 


Y = 




will tend to normality with a limiting variance of J, which is also the variance of 
the limiting distribution. Setting 

\/n 






0 


-? < X < 1 - - 

n n 


^ a ^ - a 
X ^ , a; ^ 1 — - , 

n n 


we obtain (4.4) from the integral 


dx. 


aln 

In proving the conditions (ii) and (iii) of Theorem 3.2 are satisfied, we shall 
assume for simplicity that a = 0. We find that 

9 - 


qnivf) 


0 




w g 


-W'f' 


so obviously (ii) is satisfied. From the Law of the Mean in the form due to 
SchlSmilch, we have 


W = v^n sin * Y — y/n sin"* y/p 


(4.6) 


[(i + 2.y^r)(i-2,yTK)I’ 

-w^ 


< Y <- is 



nq 

V 


0 < < 1 , 
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The denominator of the coefficient of 2F here is a quadratic function of F with a 
negative coefficient of F‘, and so must assume its least value in the F range 
indicated in (4.6) at one end or the other of the range. From this it is readily 
seen that the coefficient of 2 F is actually always less than unity. For values of 
F outside the range, the second member of (4.6) indicates that W = 0{\/n) = 
0(F). Hence (iii) is satisfied, and the proof of the statement in italics is com- 
plete for the case a = 0. The more general case presents no important new 
difficulties. 

In practice, it is often convenient to express X as a percentage. This merely 
has the effect of multiplying F in (4.5) by 100. We find in this case that \/n 
8in“^\/X + lOOa/n — \/n sin“^\/l00p + lOOa/n has a distribution ap^ 
proaching normality, and o-r — ► 50 instead of J. 

Bartlett [1] gives numerical results in the cases n = 10, a = 0 and n = 10, 
a = I, which indicate that perhaps the choice a = ^ is more suitable if the 
estimated p is near 0 or 1, but the qhoice a = 0 is preferable if the estimated p 
lies between .3 and .7. However, there seems to be no good reason to believe 
that these conclusions should be valid for other values of n. The question of an 
optimum a, and of the degree of convergence to normality remain open. We 
note in passing that the latter problem could doubtless be profitably studied by 
combining the methods of proof of Theorem 3.1 with the results of Uspensky 
[15, pp. 129-130] on the degree of approximation of the reduced binomial d.f. 
to the normal d.f. 

IV. Other transformations of a binomial variate. Let X have a binomial 
relative frequency distribution with the parameter p and the n values 0, 1/n, 2/n, • • • , 
njn, 

(a) // 


T = f{X) 


Vn sinh~‘ VX = Vn log {VX + Vl + X),* X ^ 0 
0 , X < 0, 


then the distribution of T — ^/n sinh ‘ \/ p tends asn—*<»toa normal distribu- 
tion which has mean zero and variance g/(4 + 4p), and liin„_«<rr = q/(i + 4p). 

(b) If 


T 


/(X) = 


-\/ n log X, 
0 , 


X > 0, 
X ^ 0, 


then the distribution of T — \/n log p tends as n <xi to a normal distribution 
which has mean zero and variance q/p, and limn-.»ar = q/p. 
ic)If 


r»/(X) 


^Vn log 



* All logarithms in this paper are to the base e. 


0 


f 


0 < X < 1, 
X ^ 0, X i 1, 
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then the distribution ( 


tends as n—^ <x> to a normal distribu- 


o/r-l Vnlogj-^p, 

tion which has mean zero and variance -1 1 (ipq), and = l/{ipq). 

Since the limiting variance of each of these transformations involves the 
parameter p, they are not to be regarded as solutions of the problem of asymp- 
totic variance stabilization proposed at the beginning of section 3, although it is 
perhaps of some interest that their distributions become asymptotically normal. 

In case (a), /'(x) = \/n/(2 \/x^ + x), x > 0. Setting \l/n{x) = /'(x), x > 0, 
and \pnix) == 0, x g 0, we obtain 


(4.7) 


y = (X - p)^n(p) 


= \/n(X — p) y/q 


v pq 2vr + p ’ 


and this variate obviously has the limiting distribution ascribed to T — 
yfn sinh”^ \/p in the statement in italics. The truth of that statement now 
follows by an argument similar to that used in the case of the inverse sine transfor- 
mation. 

If p is allowed to vary with n in such a way that limn.^oo’ip = , it is known 

that the reduced distribution of X will still tend to normality.'*^ If we suppose 
that limn-ooP = 0, but limn-«wp = we find from Theorem 3.3 that the 
limiting distribution of Y in (4.7) will be normal with mean zero and variance 
J, and that (s\ — ♦ J. It is easily verified that the conditions (ii) and (iii) of 
Theorem 3.2 are still satisfied, so we find that the limiting distribution of [y/n 
sinh”^ \/X — y/n sinh”^ \/^ is normal, with mean zero and variance J, and 
or* J, However, since n is now the only independent parameter, we cannot 
here regard the transformation T —y/n sinh“^ \/X as a solution of the problem 
of variance stabilization, because the variate T depends explicitly upon n. 

If in case (b) we proceed as in case (a), we obtain as the analogue of (4.7) 
the formula 


and this variate has the limiting distribution ascribed to T — \/n log X in the 
statement in italics. It now turns out that although condition (ii) of Theorem 
3.2 is satisfied, condition (iii) is not satisfied. We are then faced with the 
problem of proving directly that the improper integral 

[Vn log (p + py/Vn) - Vn log p]* dF^iy) 

converges uniformly." The trouble occurs only at the lower limit of integra- 
tion, and may be resolved by first integrating by parts, then dividing the range 


>* See (e.g.) 19]. 

>1 See the remarks following the proof of Theorem 3.2. 
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(-Vn. -^i) into two ranges {-y/n, —log n) and (-log n, 4i), and then 
applsdng Uspensky’s results [15, pp. 129-130], on the degree of approximation 
involved in the DeMoivre-Laplace theorem. 

Case (c) may be handled in a similar manner. 

5. The logarithmic transformation. We shall suppose throughout this section 
that Z is a variate whose mean and standard deviation a in the relation 
<r = k„(jin + a), where a is an arbitrary constant, > 0, and lim,-,.!;, exists 
and is finite. If k„ is constant for all n, say kn — k > 0, and if we use the 
heuristic argument of the second paragraph of section 2 to attempt to find a 
transformation which will stabilize the variance of X at fc*, we arrive at the 
function T = log (Z + a), Z > —a. It is the purpose of this section to study 
the asymptotic properties of this transformation. 

The theory of such a transformation differs in certain important respects 
from that of the transformations considered in sections 3 and 4. For one thing, 
our starting point in the study of each transformation considered in section 4 was 
the fact that although P(X < 0) = 0, nevertheless the reduced distribution of 
Z tended to normality as n « . But in the present case, if Z is a variate such 
that P(X ^ — a) = 0, then the corresponding reduced variate Y = {X — Hn)/ 
[kniftn + «)] has a d.f. F„(y) such that F»(— 1/A:„) = 0. Thus if limn-.»A;. = 
k > 0, the limiting distribution of Y, if it exists, must have a d.f. F(y) such that 
F(— 1/A: — 0) = 0. Therefore the limiting distribution of Y can never be nor- 
mal if A: > 0. 

Moreover (in contrast to the situation in Theorem 3.1) if the reduced variate 
Y does have a limiting distribution, the variate 

(5.1) + + X>-. 

may have a limiting distribution which is not the same as that of Y. More 
specifically, we have the following result: 

Theorem 5.1. Ld P(X ^ — «) = 0, let lim„_«A;, — k ^ 0, let Fn(y) be the 
d.f. of the reduced variate 


Y - ^ ~ I** 

A:,(m. + «) ’ 


and let Hn{w) be the d.f. of the variate W givenhy (5.1). If a continuous d.f. F{y) 
exists such that limn-.«F,(y) = F{y) for cdl y, (hm 


Urn Hn{v)) 



fc > a 


A; «> 0. 
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The proof is simpler than the statement; essentially we have only to notice that 


H.W - P[-1 < K S 


— GO < 10 < 00, 


and apply the reasoning used above in connection with (3.4). 

From the study of the distribution of T, we now turn for a moment to the 
question of the limit if ar . Here the situation is more consistent with the 
results of section 3. 

Theorem 5.2. Under the hypotheses of Theorem 5.1 and under the additional con- 
ditiofis thcU the improper integral w^dHniw) fc»*[log (1 + k„y)]^dFn(y)^ 

--Ho 

converges uniformly in n and that I y^dFiy) = 1 = E(Y^), the foUounng relations 
hold: 

o^ 1 - if S dF{y), k > 0, 

f6.2^ hm E{W) — < J-i/* k 


[ 0 , A; = 0, 

(6.4) Ita £(IV) = I £. P ‘ > " 

[ 1 , fc = 0 

• 

The variance of the variate T = log (X + a) is related to these mean values 
by the equation == k\[E{W^) — [E{W)f]. Thus if F{y) is independent of 
any unknown parameters and if k is positive and is presumed to have the same 
value for all variates in any given problem, then the transformation T = 
log(X + a) is seen to yield an asymptotic stabilization of the variance under 
the conditions of Theorem 5.2. If A = 0, .we find from either Theorem 5.2 or 
the proof of Theorem 5.2 that T = log(X + a) converges stochastically to 

log(Mn + a). 

The proof of Theorem 5.2 is similar to that of Theorem 3.2 and will be omitted. 
Theorem 5.1 raises the following question: Just what limiting distribution 
must Y have if A: > 0, in order that the distribution of W tend to normality? 
To answer this, vre shall note the following simple non-asymptotic result; 

Theorem 5.3. A necessary and sufficient condition that X have a continuous 
distribution with density function 

1 1 

V'2ir log (i* + 1) * + « 

(5.4) ^(x) = i [-(log 

^ L 2 log (** + 1) J’ ® 


(5.4) ^(x) 


X exp 


X > —a 


X ~0L 
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for which ax = *(m + «), w that the variate T = log(X + «) have a normal die- 
tribtUion with mean log(/u + a) -log \/k* + 1 and variance log(A;® + 1). 

The proof may be given by a routine change of variables.” It is to be noticed 
that the heuristic argument of the second paragraph of section 2 would lead to 
the incorrect result that the variance of T was A* instead of log(Jfc* + 1). In 
case A = 1, the mean and variance of T are respectively log(M + «) — .347 and 
.693. If the transfor mation T = logioC.^ + a) is used, the new mean is 
logioO* + o() — logic \/ A? + 1 and the new variance is .189 log(A:* + 1, which 
for values of k near zero has the approximate value .189A:*.** 

If X is distributed according to (5.4), the density function F'{y) of the corre- 
sponding reduced variate F = (X — ii)/lk(jt -f a)] is 


(5.6) F'(y) 


V2ir log (Jk* -I- lyi+ky 

r Oog [(1 + kyWk* + l]}n 
L 2 log (t* -b 1) J 


X exp 


0 


y > 
y ^ 


1 

k 

1 


k' 


The d.f. of the variate IF = A:“'[log(X -|- a) — log(a + «)] is F[(e*“ — 1)/A:], 
and, of course, the distribution of IF is normal with mean — log\/fc* -b 1, 
and variance k~* log(A:* -f 1). These are the respective values of the integrals 
in (5.2) and (5.3). 

If now the distribution of X depends on a parameter n in such a way that as 
n — » oo , the distribution of the corresponding reduced variate F = (X — a»)/ 
(A:»(an + a)] tends to the distribution given by (5.5), it follows from the above 
remarks and from Theorem 5.1 that the variate IF given by (5.1) has a normal 
limiting distribution. Furthermore, under the uniform convergence condition 
of Theorem 6.2, it follows that ar tends to the value logfk* -I- 1), where T = 
log(X -t- a). 

These facts provide a sound mathematical basis for the use of the logarithmic 
transformation, which has had a long history of empirical success in problems of 
normalization [12, chapter XVI] and stabilization ([6], [16]). When it appears 
from a reasonably large number of observations on a variate (which is essentially 
bounded from below) that the standard deviation of the variate is proportional 
to the mean, then a possible specification for the variate is a distribution of the 
form (5.4); or, at least for large values of m> it may be assumed that the distribu- 
tion of the reduced variate is given by (5.5). Then the variate T = log(X -|- a), 
where —a is any number less than the lower bound of X, will be exactly or ap- 
proximately normally distributed with a variance independent of the value of a. 

Since (5.4) is only one of an infinity of various different types of distribution 


** Finney [11] has considered the problem of efficiently estimating the variance of the X 
of Theorem in the case a 0. (The actual denrity function (5.4) appears nowhere in 
his paper.) 

<* Given (without exidanation) by Cochran [6, p. 186]. 
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in which the mean and standard deviation are proportional, the user of a loga- 
rithmic transformation in the analysis of variance should always apply tests for 
departure from normality to the observed distribution of T values. From the 
point of view of specification, the situation here would seem to be less reassuring 
than in the cases considered in section 4. While it is true that the Poisson 
exponential distribution is only one of many types of distribution in which the 
variance and mean are equal, nevertheless the specification of a Poisson distribu- 
tion can generally be preceded by a fairly strong chain of a priori inductive 
reasoning. This would not seem to be the case in the specification of (5.4). 
Theorems 5.1 and 5.2 furnish some grounds for a suspicion that the logarithmic 
transformation may possibly be more successful in stabilizing the variance than 
in normalizing the data. The burden of proof, however, lies with the experi- 
menter.^^ 
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A transformation closely related to the logarithmic one is T ■■ 8inh“K^^)*f where 

is an estimate of the Charlier coefficient of disturbancy of a Poisson distribution. This 

ransformation has recently been studied from an empirical point of view by Beall [2]; 
was suggested by the heuristic argument of section 2 applied to the case in which a* 

+ kfA*. Beall presents evidence that for the particular data which he considered, the 

ransformation seemed to stabilize the variance and normalize. A mathematical theory 

rould follow the lines laid down above in the case of T log (X + a). 
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We consider a probability function P(E) defined over the Borel set of events 
generated by the n arbitrary events Ei , • • • , En , which will be denoted by 
£( 1 , • • • , n). 

We use the same notations as in the author’s former paper*, with the following 
abbreviations. We denote a combination (ai • • • Oo) simply by (a), and use 
the corresponding Latin letter o for its number of members. Similarly we write 
03) for 08i • • • fib), but (v) for ( 1 , • • • , n). We say that (fi) belongs to (a) and 
write (fi) f (a) when and only when the set (fii ■ • ■ fib) is a subset of (ai • • • a«). 
Then and then only we write (a) — (fi) for the subset of elements of (a) that do 
not belong to (fi); thus we may write it as ( 7 ) with c = o — 6 . When and only 
when (a) and (fi) have no common elements, we write (a) + 03) for the set of 
elements that belong either to (a) or to 03); thus we may write it as ( 7 ), with 
c = a + b ^ n. We note the case for empty sets: (0) + (0) = (0). Now we 
can write p[(,)) for ?[«,...«.] , ?((.» for p., , Pb((a)) for pb(ai •••««), etc. 

Further we denote by p[ 6 ) ((a)) (1 g b g a g n) the probability of the occurrence 
of exactly b events out of E ^, , • • • , , and write 

Pr((.'))= Z P«((a)), Pr"((»'))= Z Pw((a)); 

<«) « (») (a) I (r) 


since a is fixed by the left-hand sides, the summations on the right-hand sides 
are to be extended to ail the ^^ycombinations of (r). 

A sum written 2 is to be extended to all combinations (fi), 6 = 0, 1 , • • • , a 

Wi(a) 

belonging to (a), when b is not previously fixed; it is to be extended to all the 
^^^-combinations belonging to (a), when b is previously fixed. 

Definition 1. A system of quantities is said to form a fundamental system of 
probabilities for a set of events if and only if the probability of every event in the 
set can be expressed in terms of these quantities. 

Definition 2. An event in £(1, ••• ,n) is said to be symmetrical if and only 
if it is identical with every event obtained by interchanging any pair of suffixes 
(i, j) (i, j = I, ‘ , n) in the definition of it. The subset of symmetrical events 

in £(1, - ,n) will be denoted by S(l, • • • , n). 

From the normal form* of every event in £( 1 , • • • , n) and the principle of 


> "On the probability of the ocourrenoe of at least m events among n arbitrary events,” 
AnnaUcfMath. 8tat., Vol. 12, 1941. 

* See Hilbert-Ackennann, OrundtOge der thtoretisehen Logik, Chi^. 1. 
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total probabilities, we can easily see the truth of the following theorems, which 
may of course be made more precise. 

Theorem. The system of P((«)] , (a) c (»»), 2" in number, forms a fundamental 
system for £(1, • • • , n). 

Theorem. The system of P(oi((»')), O^o^n, n + ltn number, forms a 
fundamental system for S(l, • • • , n). 

Next, a theorem of Broderick*, in a less precise form, may be stated: 

The system of P((o)) (pm) = 1), (a) < (»'), 2" in number, forms a fundamental 
system for S. 

We may add in an easy way the following 

Theorem. The system of Sa((y)) So((y)) = 1, 0 ^ o ^ n, » + 1 m number, 
forms a fundamental system for S. 

In the present paper we shall prove, inter alia, the following four theorems 
of the above t 3 T)e, stated in more precise forms. 

Theorem 1. For any E in £, xoe have 

P(E) = CO + Z c.p,((a)), 

(a)€(p^ 

afdO 

where Co = 0 or 1 and the CaS are integers; and they are unique*. 

Theorem 2. For any E in S, we have 

P{E) = Co + Z CaP^'\ 

0—1 

where Co = 0 or 1 and the CaS are integers', and they are unique, 

•Theorem 3. For any E in S, we have 

P{E) = do + Z d.Pm((«)), 

(a)€(r) 

aytO 


where do = 0 or 1 and the d.s are rational numbers and they are unique. 
Theorem 4. For any E in §, we have 

P(E) = do + Z d.pi^\ 

0—1 

where do = 0 or 1 and the da 8 are rational numbers; and they are unique. 

Less precisely, we may say that the system of Pi((a)) or P[ii((a)) forms a 
fundamental system for S; the system of Pa\{v)) or forms a funda- 

mental system for S. 

In fact however, we shall give much more than the mere proofs of 

* Fr6chet, ''Complements k un theoreme de T. S. Broderick concemant les ev6nement8 
dependants,” Proc. Edinburgh Math. Soe.^ Ser. 2, Vol. 6 (1239). 

* "Unique” in the sense that it is impossible to replace therein the coeflGksients e by other 
numbers which are independent of the Borel set of events and the probability function. 
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these theorems. We shall establish the following explicit formulas for the 
general parameter m. 

(i) Pim = 1 — PiCW), 

(1-1) (ii) = 2 (“l)^Vi((»') “ («) + (/3)),‘ 1 <o<n. 

(0)€(a) 

n—a-^bfio 


(1) 


p,..., = (-1)-^ S Z (-1) U + J J 

W- 1 e""tn <f>-nuac(0»e— a) T" O Wl/ 


22 P*((y) — (5) + (5)), n > o > m > 2.® 

<»).(►)-(«) 

(>)-(•)*<«) 


(2.1) 


Pw((«'))= Z (-1)' 


b — n+a 


bf^O 


(n - a> 




1 < a < n. 


(2) Pt.]((v)) = E (- l)‘-"L(n, o, fe, n>a>m>2, 

b*m 

where 


L(n, a, b^m) — 


(-ir 


c-.r 


, 6<n — o + m — 1, 
, 6 = n — o + wi — 1, 


(3) (i) 


(ii) 


(-l)“~“(OT-l)!(6-m)! 

• (o — ot) I {oft — n(m — 1 } 6>n — o + m — 1. 

I, ol (n — a) ! (o + 6 — n — m + 1) !’ 

Pr(.,i = (-i)*"^i: “T"”' (-i)'"C + 7i J" 

ft e—m <f"-inax(0.e^) i (* 

X] Pw((7) “ (5) + (5)), n > a > m > 1. 

(b)€(p)— (a) 

(7)-(«)«(«) 


(4) Pw((»'))= Z (-1)’*-“+®-** 

b*m+n— a 


C::)(:p 




A simpler derivation of (1) than that given in an earlier paper* follows. Let 
us write Poincare’s formula as follows: 

p.(W) = t(-ir- (::{)«»). 


• Obviously we mean ((f) — (o)) + (fi) and (( 7 ) — (S)) -f (a) respectively; similarly in the 
sequel. 



126 


KAI LAI CHXTNG 


Then for a fixed b ^ m, summing over all (|8) e (v), we get 

Ep.(W)-t 


Hence 


£ (-1)- MW) - 1 (: : 1) «W) £ {I : :) 


A change of notation gives, for o + 6 ^ m, 


C:-7‘) 


Pf(«)+(/»)) 


0— m (y)«(a)+(^) 


Hence 


/a + 6 - A 2 

\ m 1 / 0)«(F)-(a) 

. E (-1)- "‘T' s , )-«-) - « + <»»■ 




Substituting in the well-known formula, for o S 1 


E (~i)* E P((«)+w» > 

t-O 


get for n ^ Q> ^ m 


n min 

= E (-ir^ E , 


d<"naax (0,o— a) 


E - («) + («)) { S ( 6 d 0( « - 1 ) }' 


(»)«(p)-(a) 


Thus the problem reduces to the summation of the foUowng series: 

Case 1: m = 1.' In this case the series reduces to 

E (-1) \6-d/\0 ifd<n-o. 
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Hence for a ^ 1, 


X (-1)*"* £ PiCW - (a) + ( 7 ) -(W -(«)))(-!)"■“ 

e-inax(l,i»-<i) (7)~((’')~(<«))«(a) 

Writing (y) — ((y) — (a)) = 09), we obtain 

p[(.)i= t (-ir* E Pi((y)-(a) + m 

6»inax(l— n+a,0) (/S) € (ot) 

This is equivalent to (1.1), (ii), while (i) is trivial. 

Case 2: m ^ 2. We have, fore ^ 1, 

which is easUy proved by induction on a. 

Hence for m ^ 2, 

- '-‘’t T I 

= ” + 2 Y‘ 

' ' n — 1 \a + d - tn/ 

Substituting in (1) we get formula (1). 

To derive formula (2.1) for a fixed a, 1 ^ a ^ n, we sum (1.1, ii), which gives 

PM((y))= E Pu.)i» E (-ir‘ E E Pi(M-M + m. 

(a) « (!■) t-0 («) • (») (fi) , (a) 

n-n+i/M 


Letting (y) - (a) + (0) = ( 7 ), we get 


Pw(W) “ E (- 1 ) 

«*nuuc(l(fi--a) 


n— a+«— 1 


( M E : 

\n a/ (y) « (,) 


Pi((y)), 


which is formula (2.1). 

The following form of Poincare’s formula is of assistance in deriving (2): 


Pw(W)-E(-i)*^(®)s.((v)). 
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Substituting from (1), we get 

p,.,(w) - 1 (-i)--(:)(: : ± (-»*-(: : ^ 

= t (-i)-pr(M) ( (:) (: : (:- \y}. 

Thus the problem reduces to the summation of the following series: 

«, ») - ^ (-»-(:)(: : o(: - ‘i)“' 

First, we have, for 2 ^ 0, 1 / ^ ly, 

2 (-1)* (*) (x + y) ■ {x + w) 

X— max(0,l--w) \*^/ 

0 if y — w + 1 < z, 

(-iTyHu + I - w)> , 1 ^ , 

[(2 + k; — l)!(t/ + 1 — — z)! ^ ’ 

which may be easily proved by induction on 2 . 

Next, we have 

L(n, a, b, m) = Z I 

dl e—max(a,l!>) \C 0/ (c O) ! 

= - 1) ! T / _ 1 >«'+»-<■ (c' + b)(c' + b-m)\ 

* a! e'-nuix(0,a-6) \ c' ) (c' + 6 — o) ! 

>,b-a (m — 1)! 


= (-ir 


al 


n—b 

E i-iy' 

c'—TtmxCO.o— b) 


/n — 6\ (c' + 6 — m + 1)! + (m — l)(c' + 6 — m)! 
'V c' } (c' + 6-'^ 

= (-I)*-* — ' {T{n, a, b, m) + (m - l)Tin, o, 6, wi + 1)), 


where 
Tin, 


, a, 6, m) = 2 (-!)*(” 

c»«TOax (0,0—6) \ ^ / 


— A (c + 6 — rw + 1) ! 


0 


(c + 6 — a) ! - 

if 6<n — o + m— 1, 


(-l)""‘(a - m + 1)!(6 - m + 1)1 


(n — a) ! (a + 6 — n — m + 1) ! 


if b^n — a + m— I, 


by the preceding formula; Thus we get the explicit expression for L(n, a, b, m) 
given in formula (2), which is thereby proved. 

The derivations of formulas 3 and 4 are similar to the above and may be 
omitted. 
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Now we can give the essential argument for Theorems 1-4. It is evident 
that for any E in we have 

P(E) = 2p,(„„ , 

where the summation extends to certain combinations (a) e (r). Substituting 
from formula (1.1) we get Theorem 1; substituting from formula (3) we get 
Theorem 3. Next, for an)' E in we have 

P{E) = 2p, .,((»')), 

where the summation extends to certain values of a. Substituting from formula 
(1.1), (i) and formula (2) we get Theorem 2; substituting from formula (3), (i) 
and formula (4) we get Theorem 4. We may note these proofs are “construc- 
tive”. 

It remains to prove the uniqueness of the coefficient.s in The. rems 1-4. For 
Broderick’s theorem this has been done by Fr4chet*, by introducing “inde- 
pendent events”. Our. proof will be ba^d on the conditions of existence, also 
initiated by Fr^chet*, for the systems Pi((a)), Pm((a)), Pa\{v)), Pi‘'((i')). 

The conditions of existence of the system pi((a)) have been given by the 
author in the paper*, though the proof there is not quite complete. 


1. Conditions of existence of the system Pa\(v)). Given n quantities Qa\ 
1 ^ 0 ^ n; what are the necessary and sufficient conditions that they may be 
the system of Pi*’((K))’s, 1 g a ^ n, of a probability function defined over 
®(1, ••• ,n)? 

From formula (1.1), (i) and formula (2) it is evident that necessary conditions 
are, for 1 ^ o ^ n. 


(3) 

and 

(4) 


1 - Qi.*’ ^ 0, 


6— n—a 
hfiO 


c— l 6—n-~a 
bf^O 


V-a) 


Qi‘^ + 1 - q;.*’ = 1 


(i) 


The last condition can be re-written as 


z (-ir 


z 

a«niuuc(lifi— 6) 


(- 1 )" 




+ 1 - = 1 , 


which reduces to the identity 1 = 1. 


* '^Conditions d'existence de syst^me d’6v6neinent8 associ^s k certainea probability/’ 
Jour, de Math., 1940. However, our interpretation of the term would mean instead "con- 
ditions of existence of a probability function defined over a Borel set of events, etc." 
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To show that the conditions (3) are sufficient, put 


a 




PtOJ = 1 — Qn”. 

By (3) and (4) we have, for 0 g o g w. 


Pfa] SO and 2 P(ai = 1. 


a—O 


Hence they are actually the P[.i((i'))'s of a probability function. We want to 
show that the Po*^((»'))’s of this probability function coincide with the given 
Qa^’s, SO that this is the probability function we seek. We have, 

z p.(w) = zpi., z (;)(?";) 

(tf) • (») a— 1 A**mAx(l.b— n-f«) \^/ 

.^0 l-nJ??.-.) ^ ^ Vn - o/ W \b • 

Now the series in curl brackets 

i.)C) 

- z 

a«>nux(ltfi— c) \ V / 

If c = n, the last 

= ©-S<->-(n-6)C:‘) 

iJ ‘r„: 


• z (-ir 

a*niax(l»fi->e) 


If c < n, we have 




Therefore 




6 =» c; 
b 9^ c» 
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2. Conditions of existence of the system P(i)((a)). Given 2“ — 1 quantities 
9(ii((«))» («) * (»')) a S 1, what are the necessary and sufficient conditions that 
they may be the system of P(ii((a!))’s, of a probability function defined over 

m, •••,«)? 

From formula 3 it is evident that necessary conditions are 


(5) 


1 n minCc.ii— o) 

-Z z 

Tl e«>l d<*max( 0 .e— a) 




n — 1 
a + d — 



, Z 9111 {(>) - («) + (5)) ^ 0, 

(4) « (*»)—(«) 

(7)-(4) • (a) 



Z Prii((T)) ^ 0; 

(ir) « (») 


and 


( 6 ) 


1 n min(«.n— a) / ^ 1 X"*! 

i+i E £ E ,) 

W («)«(») 1 d**max( 0 .c— a) i" A/ 


(a)«(0-(«) 5ni((7) — (5) + (5)) == 1. 

(7)-(«)f(a) 


Consider the sum 


inin(e,n»a) / 1 

Z Z (- 1)“ („ 1 , ) Z 9m((7) - («) + («)). 

(«)«(») d-*max(0,«— a) + (X 1/ (4)c(r)>-(fli) 


('y)-(a)fr«) 


For a fixed ({), the number of ways of writing (y) = (y) — (i) + (5) is 


dJ’ 


then since (y) — (5) c (a) but (a) — ((y) — (5)) « (v) — (y), the number of 

( Tl C 1 

a — c + d)' coefficient of gii)((y)) in the sum is 

Therefore the condition (6) reduces to. the identity 1 = 1. 

To show that conditions (0) are sufficient, put the left-hand sides of (5) equal 
to pi(a)i and p[(o)i respectively. Then 


P((«» = Z Pt<«)+(«l 

mtlP)-(a) 


( 7 ) 


1 fi II— a min(c.yi— a— 6) / ^ 1 \“"4 

= -Z(-ir*Z Z (-i)*'! i) Z 

ft c—l 6*0 d— inax(0,e— a— 6) ® X/ (^)c(r)— (a) 


Z “ {^) + (5)). 

<*)«(r)-<«)-<« 

<ir)-(»)*<«)+W) 
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Let (y) = (y) — (ip) + (^), where (4>) e (a), (y) — (0) c (v) — (a). Then the 
sum in the curl brackets can be written, by a combinatorial calculation, as 

9fil((7) ~ W + (0)). 




+ b 
+ c 


+ d-l\ 
-f-l)- 


The sum in the last curl brackets is 

1 \"“1 min(c-/,n— a— b) / A / 

Inverting the order of summations, 

/ „_1 /c.f\ /a + b + d-l\ 

\a + C 1/ rf-max(oi^/-n+a) \ d / \« + C — / — 1/ 


\a + c — / — 1/ <i-m»x(o.c-/-n+fl) \ d / \a + c ^ f) 


Hence (7) reduces to 

?((.» » ^ z (-ir* z 9w((7)). 


- if / = c, 

a 

0 it f c. 


a e-l 




Then 


S6((a)) 


= Z Pm) 


= l z 

0 (l)«(a) 
dyO) 




9m((*)) 


Pw((«)) 


a 



(-ir‘5»((a)) 


- ..if., {s 

4f»<0 


The conditions of existence of the system Pi‘'((v)), 1 ^ d ^ n, are similarly 
deduced from formula (3), (i) and formula (4) with m = 1. 

Now we can prove the uniqueness of the coefficients in Theorems 1-4. Since 
the proofs are all exactly similar, we take Theorem 2. Suppose, if possible, 
there exists another system of coefficients c« , 0 ^ a ^ n so that 
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Taking the difference, we get a linear polynomial in the variables Pa\iv)), 
1 ^ a ^ n which must vanish: 

(8) (Co - Co) +i, (Ca - C:)Pi“((v)) = 0, 

a— 1 

for all “admissible” values of the variables. These values, saj'^ Qa \ are precisely 
those which satisfy the conditions (3). 

It is evidently easy to construct a system of 1 ^ o ^ n, which satisfy the 
conditions (3) written with the sign of strict inequality Hence in a suf- 

ficiently small neighborhood of the point (Qi“, Qi\ ■ ■ • , Q”’) in the n-dimen- 
sional space these strict inequalities still hold. Hence the po^momial vanishes 
in this neighborhood and so must vanish identically; that is, 

Ca — Ca = 0 for 0 ^ a ^ n. Q. E. D. 



ON THE EFFICIENT DESIGN OF STATISTICAL 
INVESTIGATIONS 

By Abbahah Wald 

Columbia University 

1. Introduction. A theory of efficient design of statistical investigations has 
been developed by R. A. Fisher* and his followers mainly in connection with 
agricultural experimentation. However, the same methods can be applied to 
other fields also. All statistical designs treated in the aforementioned theory 
refer to problems of testing linear hypotheses. By testing a linear hypothesis 
we mean the following problem: Let yi, ••• yVs be N independently and 
normally distributed variates with a common variance <r*. It is assumed that 
the expected value of is given by 

(1) ^(y«) = PlXia + + • • • + fttSCpa (a = 1, • • • , V) 

where the quantities x<a(t = = are known constants and 

01 , • • ■ , fip axe unknown constants. The coefficients 0i , • ‘ , 0p are called 
the population regression coefficients of y on Zi , X 2 , • * • , and Xp , respectively. 
The hypothesis that the unknown regression coefficients 0i, ••• , 0p satisfy a 
set of linear equations 

(2) ga0i + • • • + gip0p = p.- (* = 1, • • ' , r; r ^ p), 

is called a linear hypothesis. The problem under consideration is that of testing 
the hypothesis (2) on the basis of the observed values Pi , ” - , yy . 

In many cases the experimenter has a certain amount of freedom in the choice 
of the values x,-. . The efficiency of the test is greatly affected by the values of 
Xia . The statistical investigation is efficiently designed If the values Xia are 
chosen so that the sensitivity of the test is maximized. Let us illustrate this 
by a simple example. Suppose that x and y have a bivariate normal distribution 
and we want to test the hypothesis that the regression coefficient 0 of y on x 
has a particular value 0o . Suppose, furthermore, that the test has to be carried 
out on the basis of N pairs of observations (xi , yi), • “ , (xy, yy), where the 
experiments are performed in such a way that Xi, , Xy are not random vari- 
ables but have predetermined fixed values. It is known .that the variance of 

If 

the least square estimate 6 of j3 is inversely proportional to ^ (xa — X)* where 

a—l 

* “ (*i + • • • + Xy)/N. Hence, if we can freely choose the values Xi , • • • , Xy 
in a certain domain D, the greatest sensitivity of the test will be achieved by 
choosing Xi , • • • , Xj^ so that Zfx. — 2)* becomes a maximum. 

In the next section we will introduce a measure of the efficiency of the design 

* See for instance R. A. Fisbbb, The Design of Experiments, Oliver and Boyd, London, 
1085. 
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of a statistical investigation for testing a linear hypothesis. In sections 3 and 4 
it will be shown that some well known experimental designs, used widely in 
agricultural experimentation, are most efficient in the sense of the definition 
given in section 2. 


2. A measure of the efficiency of the design of a statistical investigation for 
testing a linear hypothesis. The hypothesis (2) can be reduced by a suitable 
linear transformation to the canonical form 


(3) di = di = • • * = dr = 0, (r < p). 

Hence, we can restrict ourselves without loss of generality to the consideration 
of the hypothesis (3). 

tt 

Denote ^ XiaXja by o,-,- and let the matrix || c,/ 1| be the inverse of the matrix 

a—1 

ll®»j|| ih 3 = Ij • * * I P)- Denote by hi the least square estimate of 
fii {i = 1, • • • , p). It is known that the estimates hi , • • • , have a joint 
normal distribution with mean values A , * • • , , respectively. It is further- 

more known that the covariance of b« and by is equal to Cav. The statistic used 
for testing the hypothesis (3) is given by 


(4) 


«?m b|b«* 

^ "" p J-1 m-1 

£ (y» - • • • - 


where || || is the inverse of || || ((I, m = 1, • • • , r). The statistic F 

has the F-distribution with r and iST — p degrees of freedom. The critical region 
for testing the hypothesis (3) is given by the inequality 

(5) F^Fo, 


where the constant Fo is determined so that the probability that F > Fo (cal- 
culated under the assumption that (3) holds) is equal to the level of significance 
we wish to have. 

It is known that the powet function* of the critical region (5) depends only 
on the single parameter 


( 6 ) 

<r i-l m-1 

Furthermore this jpower function is a monotonically increasing function of X. 
The coefficients ojm are functions of the quantities (t = 1, • • • , p; a *= 
AT). The choice of the values x<« (i = 1, • • • , p; a = 1, • • • . iV) is 
the better the greater the corresponding value of X. If r = 1, the expression X 


* See for instance P. C. Tano, ‘‘The power function of the analysis of variance tests,” 
Slot. Ret. Mem., Vol. II, 1938. 
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1 ♦ J 

reduces to an /Si . 
<r 


Hence, if r = 1, we maximize X by maximizing afi . Since 


(hi == l/cn , we maximize X by minimizing Cn . Thus, if r = 1, we can say that 
we obtain the most powerful test by minimizing Cn , i.e. by minimizing the 
variance of fei . If r > 1, the difficulty arises that no set of values 
(t = 1, • • • , p; a = 1, • • • , N) can be found for which X becomes a maximum 
irrespective of the values of the unknown parameters /Si , • • • , /Sr . Hence, if 
r > 1, we have to be satisfied with some compromise solution. For this purpose 
let us consider the unit sphere 


(7) 


/S? + • • • + /Sj = 1, 


in the space of the parameters /Si , • • 
in p of the determinantal equation 

♦ _♦ 


( 8 ) 


Oil — P 

012 

air 


* 

* 

Oji 

022 P 

• * • 02r 

* 


* 

Url 

Or2 

• • • Orr -- P 


, /Sr . It is known that the smallest root 


= 0 , 


is equal to the minimum value of on the unit sphere (7). Similarly the 
greatest root of (8) is equal to the maximum value of <r\ on the sphere (7). The 
compromise solution of maximizing the smallest root of (8) seems to be a very 
reasonable one. However, for the sake of certain mathematical simplifications, 
we propose to maximize the product of the r roots of (8). Since the product of 
the roots of (8) is equal to the determinant 


Oil • ■ 

4c 

•• Oi,. 

* 

4c 

Orl • ’ 

• Orr 


we have to maximize the determinant (9). The value of the determinant 
I Cim I (/, ^ == 1, •••, r) is the reciprocal of that of (9). Hence we maximize 
(9) by minimizing the determinant 1 c/m 1 . The generalized variance of the set 
of variates 6i , • • • , 6r is equal to the product of a*’’ and the determinant | c/m | . 
Thus, our result can be expressed as follows: The optimum choice of the values 
of Xia is that for which the generalized variance of the variates 6i , • • • , 6r 
becomes a minimum. 

Any set of pN values (i = 1, • • • , p; a = 1, • • • , AT) can be represented 
by a point in the pAT-dimensional Cartesian space.* Denote by D the set of all 
points in the piV-dimensional space which we are free to choose. If N is fixed 
and if any point of D can be equally well chosen, the following two definitions 
seem to be appropriate: 

Definition 1. Denote hy c the minimum value of the determinant | Cim | 
(Z, m = 1, • • • , r) in the domain D, Then the ratio c/\ Cim | is called the efficiency 
of the design of the statistical investigation for testing the hypothesis (3). 

Definition 2. The design of the statistical investigation for testing the hypothesis 
(3) is said to be most efficient if its efficiency is equal to 1 . 
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3. Efficiency of the Latin square design. A widely used and important 
design in agricultural experimentation is the so-called Latin square. Suppose we 
wish to find out by experimentation whether there is any significant difference 
among the yields of m different varieties Vi , • • • , . For this purpose the 

experimental area is subdivided into ni plots lying in m rows and m columns 
and each plot is assigned to one of the varieties , • • • , v,„ . If each variety 
appears exactly once in each row and exactly once in each column, we have a 
Latin square arrangement. Denote by yijK the yield of the variety Vk on the 
plot which lies in the t-th row and j-th column. The subscript k is, of course, 
a single valued function of the subscripts i and j, since to ea(*h plot only one 
variety is assigned. The following assumptions are made: the variates j/tjk 
are independently and normally distributed with a (‘ommon variance <t^ and the 
expected value of y^jk is given by 

( 10 ) E{yijk) = Ml + *'7 + PJfe* 

The parameters < 7 “, Hx , Vj and p* arc unknown. The hypothesis to be tested 
is the hypothesis that variety has no effect on yield, i.e. 

(11) Pi = P2 = • * • = Pik . 

We associate the positive integer of(t, j) = (i — 1)^/? + j with the plot which 
lies in the f-th row and j-th column, (i, j = 1, • • • , m). It is clear that for 
any positive integer a < m“ there exists exactly one plot, i.e. exactly one pair 
of values i and j, such that a = a(f, j). In the following discussions the symbol 
I/a (a = !,•••, m‘) will denote the yield yxjk where the indices i and j arc de- 
termined so that a(f, j) = a. The plot in the f-th row and j-th column will l)c 
called the a-th plot where a = a(/, j). 

We define the symbols tia , Uja , Zka (b j, A: = 1, • • • , «r, a = 1, * • * , 

as follows: tia = 1 if the a-th plot lies in the f-th row% and ha = 0 otherwise. 

Similarly Uja — I if the a-th plot lies in the j-th column, and Uja = 0 otherwise. 
Finally Zka = 1 if the A:-th variety is assigned to the a-th plot, and Zka = 0 
otherwise. Then equation (10) can be written as 

E{ya) = H\Ua + • * • + fimttna + + • * * 

(12) 

+ VmUtna + P\Z\a ’ + PmZma - 

Denote the arithmetic means — ^ S La, —5 53 , and — $3 L > 

'ITr a — 1 a -1 a -1 

2i respectively. Let L* = ha — h , Uia = w*a — Ui , Zia = Zia — Zi , pi = 

Pi — Pm , Pi = Pi ~ Pm and Pi = Pi — Pm for i = 1 , • • • , m — 1. Let further- 
more Wa — 1 for a = 1, • • • , Then we have 

( ^ia ~ L‘a “h ^i^a\ '^ia “ i^ia "I" I^il^a , ^ia ““ ^ia "I” ZiWa , 

(i = 1 , ••• - 1 ) 

(l““fl“~***”” “ La ■”“ • • • — | 

(1 — 111 — • • • — Hmr-OWa — wl« — • • • — tim_l.a, 

(1 — “ • • • ^la — • • • — • 


Wma = 
^Zma = 


( 13 ) 
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From (12) and (13) we obtain 

(14) «» fiWm + + ]C + ]C P<^I« 

«-i «-i 

where 

{■* ]£/*<?{ + + ]E)p<2i + Alii» + l'm + A«». 

i«l i-l imml 

The hypothesis (11) can be written as 

(15) Pi = pt = • • • = pm-l == 0. 

This is a linear h3rpothesis in canonical form as given in (3). The values 
(t = 1, • • • , m — 1 ; a = 1, • • • , m*) depend on the way in which the varieties 
vi y • • • y Vm are assigned to the plots. We will show that we obtain a most 
efficient design if we distribute the varieties over the m* plots in a Latin square 
arrangement, i.e. if each variety appears exactly once in each row and exactly 
once in each column. 

Let Qia = Wa y “ tia (f ~ !>***» ^ 1)) 9*»+y.« ^ U/a {j = 

m* 

1, • • • , m — 1) and qtm~i+k,<, = z'ka (k — 1, • • • , to — 1). Denote QioQia 

a"»l 

by a,/ (iy y = 1, 2, • • • , 3m — 2) and let the matrix || c.-, || be the inverse of the 
matrix II a, 7 || (t, j = 1, • • , 3m — 2). Let us denote by A the determinant 
I «*i 1 (ii j ~ 1, • • , 3m — 2), by Ai the determinant | a,/ 1 (f, j = 
1, • • • , 2m — 1), by A2 the determinant | 0,7 j (iy j = 2m, • • • , 3m — 2) and 
aJ thft determinant | c,/ 1 (i, j = 2m, • • • , 3m — 2). We have to show that for 
the Latin square arrangement As becomes a minimum. From a known theorem’ 
about determinants it follows that 

<16) At * Ai/A. 

Hence, we have merely to show that A/Ai becomes a maximum for the Latin 
square arrangement. Denote by A, Ai and As the values taken by A, Ai and 
As , respectively, in the case of a Latin square arrangement. Since, for the Latin 
square arrangement, as is known, 

m* ml ml 

22 Z*aWj« 22 ^katia ~ 22 0 (b Aj*® 1, •••,m““l) 

we have 

(17) f-2.. 

Ai 

Since the matrix || o<y || (i, j = 1, • • • ,.3to — 2) is positive definite we have 

(18) r ^ 

Ai 

* See M. B6 crsb, Introdnetiim to Higher Algdfray 1931, pp. 31. 
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Because of (17) and (18) the Latin square design is proved to be most efficient 
if we show that Ai < Aj . 

Denote by A* the m-rowed determinant | 1 (t, j = 1, 2m, 2m + 1, • • • , 

3m — 2). Since Oi/ = 0 for j ^ 1, we have 

(19) Af =« OuAj = m‘Aj. 

Denote 23 *<«%« by bn (i, i = 1, • • • , m). Then 


( 20 ) 


bn = 0, for i ^ j 
and bn = Ni , 


where Ni denotes the number of plots to which the variety has been assigned. 
Because of (20) we have 


( 21 ) 


= MAT, ..-AT*. 


I bml bm 

According to (13) we have 

= Zia, (t = 1, • • • , m - 1) 

— • • • — “I" W«(l ^1 ~ Zmm ' 

The determinant of these equations is given by 


( 22 ) 


(23) 


X = 


1 0 0 

0 1 0 

0 0 0 

-1 -1 -1 


0 0 li 

0 0 ^ 

0 1 2„_1 

-1 -1 S 


where 5 = 1— fi — 22 — •••— 2m-i ■ It is easy to verify that 
(24) X ?= 1. 


From (21), (22) and (24) it follows that 
(26) A? = AT, AT, • • • Ar„ . 


Hence, from (19) we obtain 

(26) A* = ATiAT, • ■ • N„/m\ 

In the case of a Latin square design we have Ni = Nt== • • • = Nm = m. Hence 

(27) A, - m""*. 

Because of the condition ATi + iVt + • • • + Ar„ = m*, the right hand side of 
(26) becomes a maximum when ATi = ATt = • • • = AT* = m. Thus At At 
and consequently the Latin square design is proved to be most efficient. 
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4 . Efficiency of Graeco-Latin and higher squares. Consider m varieties 
Vi , • • • , Vm and m treatments , • • • , Suppose that we wish to find out 
by experimentation whether the yield is affected by varieties or treatments. 
For this purpose the experimental area is subdivided into plots lying in m 
rows and m columns and to each plot one of the varieties and one of the treat- 
ments is assigned. We call this arrangement a Graeco-Latin square if the follow- 
ing conditions are fulfilled: 1) each variety appears exactly once in each row and 
exactly once in each column; 2) each treatment appears exactly once in each row 
and exactly once in each column; 3 ) each variety is combined with each of the 
treatments exactly once. 

The following general abstract scheme includes the Latin square and Graeco- 
Latin square as special cases: Consider an r-way classification with m classes in 
each classification. Denote by * the value of a certain characteristic of 
an individual who is classified in the ai-class of the first classification, in the 02- 
class of the second classification, • • • , and in the Ur-class of the r-th classifica- 
tion. Suppose that rr? observations are made for the purpose of investigating 
the effect of the classes on the value of the characteristic under consideration. 
We will say that we have a generalized Latin square design if the following con- 
dition is fulfilled: Let t, j, m' and m" he an arbitrary set of four positive integers 
for which i 5^ j, i < r, j < r, m' < m and m" < m. Then among the m? indi- 
viduals observed there exists excLctly one individual who belongs to the m'-class of the 
i-th classification and m'' -class of the j-th classification. 

It is clear that if r = 3 the above scheme is a Latin square. If r = 4 we have 
a Graeco-Latin square. 

Assume that the observ^ations , 02 , * • * , Or = 1 , * • • ^ rn) are nor- 

mally and independently distributed with a common variance cr^. Assume 
furthermore that the expected value of Vm-ar is given by 

•«,.) ~ Tloi -|- • • • -f" yrar • 

The parameters and yia (i == 1, • • • , r; a = 1, • • • , m) are unknown con- 
stants. Suppose that we wish to test the hypothesis that 

( 28 ) 7a = 7.2 =. • * • = 7.m . 

It can be shown that if the number of observations is limited to m^, we obtain a 
most efficient design by constructing a generalized Latin square. The proof of 
this statement is similar to that of the efficiency of the Latin square and is 
therefore omitted. 



SOME SIGNIFICANCE TESTS FOR NORMAL BIVARIATE 
DISTRIBUTIONS 

By D. S. Villars and T. W. Anderson 

United States Rubber Company y Passaic^ New Jersey, and Princeton University 

1. Introduction. In the theory of linear regression of y on x where y is nor- 
mally distributed about a linear function of x, say v + fix, where x is a “fixed” 
variate, the t-test for the hypothesis that fi is zero (that y is distiibuted 
about v; independent of x) is well known. In this paper we apply some general 
statistical theory to the similar problem where x and y are jointly normally 
distributed. This case is commonly known as the case of “error in both vari- 
ates.” We derive a criterion for testing the hypothesis that the population 
means are the coordinates of a sp)ecified point when the ratio of the variances 
and the population correlation coefficient are known. When the ratio of vari- 
ances is known, a criterion is derived to test whether the correlation coefficient 
is zero. 

2. The means. Let us consider a sample of n pairs of observations {xi , y\ ; 

^2 j 2/2 ; • • • , yn) from a normal bivariate population. Let the variances of 

X and of y be <r* and aj , respectively; and the correlation coefficient, say p, be 
zero. Suppose the ratio of the weight of y to the weight of x, say y = Wy/Wg = 
d/al , is known although the variances are not known. It is clear then, that 
Vt y has variance al . Since the observations yi{i = 1,2, ••• ,n) can be trans- 
formed into revised observations \/y yi = , we lose no generality by assuming 

-that X and y are both distributed with variance a*. 

Under the assumption of equality of variances and independence of variates 
we shall derive a criterion for testing the null hypothesis that each observation 
Xi is of a variate distributed about the same population mean p and each observa- 
tion yi is of a variate distributed about the same population mean v. The 
hypothesis may be stated symbolically as: 

Hq:E{x) = p, E{y) = v, 
given al ^ ol == <r* and p = 0. We can write 

£ (Xi - nf = n{x - nf + 5, , 

S iVi - vf =* n(g - vf + Sy , 

where 

1 ** 1 ** 

n M n t-i 

S. = S (*< - i)* , S, » Y ivi - g)*. 
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Then n{x — n(y — vffo are each distributed independently as x* 

with one degree of freedom and each of <S*/<r* and Sy/<r* follow the x*-law with 
n — 1 degrees of freedom. If we define 

(1) »• = _ ^)* + (y - „)*, S, = 5. + -Sy , 

then nr*/<r* and Sr/«r* have independent x*-distributions with 2 and 2n — 2 de- 
grees of freedom, respectively. 

It follows from this that 


( 2 ) R 


_ / 


Sr 


= n(n - 1) j- = »(. - 1) 


(x - m)* + (y - vY 
S, + Sy 


has the F-distribution with 2 and 2/1 — 2 degrees of freedom. 

Let us define Fa so 

(3) f h2,2n-2 (F) dF = Of, 

^ fa 

where /i 2 . 2 n -2 (F) is the F-distribution with 2 and 2n — 2 degrees of freedom and 
0 < Of < 1. Then the probability is of that the sample statistic R is greater than 
or equal to Fa , i.e., 

(4) P[R > Fa] = a. 


In considering a sample value of R, at significance level of, one rejects the hy- 
pothesis of the means being /x and Vy respectively, if R is larger than Fa , i.e., 
‘larger than 1 and larger than the a significance point in Snedecor^s tables [1]. 

This F-test is a straightforward generalization to the bivariate case of the 
usual ^test as applied to the univariate case. In each case the sum of squares 
of distances of the observations from the population mean is broken up into the 
sum of squares of distances from the sample mean plus n times the square of the 
distance from the sample mean to the population mean. The <-test for the uni- 
variate case depends on the ratio of the distance of the sample mean from the 
population mean to the square root of the sum of squares of distances from the 
observations to the sample mean. The proposed F-test depends upon the ratio 
of the square of the distance of the sample mean from the population mean to 
the sum of squares of distances from the observations to the sample mean. 

It can easily be shown that the likelihood ratio criterion for this hypothesis is 


(5) 


X = 


S (xi - xf + ivi - y) 



S ixi - m )* + 2 iVi - vf 



The hypothesis considered here is one of a class of hj^otheses treated by Kolod-, 
ziejczyk [2] in a paper in which he considers the likelihood ratio criterion for a 
set of general linear hypotheses. 

Equation (4) may be written 
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(6) P{{x^ i,f + {y ^ vf>rl\ =a, 

where = Fa (S* + Sy)/[n{n — 1)]. The probability is a that the distance 
from the sample means x, y to the population means n, v is greater than or equal 
to Ta . We may call Ta the fiducial radius [3], and the equation (x — nf + 
{y — v)^ = fa defines the confidence region for the population means. 

Suppose we have two samples of rii and pairs of observations, respectively, 
from normal bivariate distributions. If the population mean of each x variate 
is fjL and the population mean of each y variate is j/, the population variance of 
each variate is <7“, and the correlation coefficient is zero, then the sample means 
xi and yi of the first sample and X2 and ^2 of the second sample follow normal 
distributions. Also Xi — X2 and yi — yi are normally distributed. Then 
= n\n^l{n\ + n2)[(xi — X2)^ + (yi — ^2)^]/<7^ has the x^-distribution with 
2 degrees of freedom. Let 

iZ (yu - yif + X fe. - + 

t-1 t.l t-1 t-1 


S (^2. - Vif, 


s'r> = i: (xi. - xo* + 


where Xu , 2/i< (f = 1, 2, • • • , nO are the pairs of observations in the first sample 
and X2» , 2/2* (t = 1,2, • • • ,712) are the pairs of observations in the second sample. 
S'r>/<T^ is distributed according to the x^-distribution with (2ni + 2 n 2 — 4) de- 
grees of freedom because it is the sum of quantities independently distributed 
as x^- Then 

/?' = / ^ ni n2(ni + n2 - 2 )r^^ 

2(Wi + 712 ) 0 '^/ (271i -f“ 2712 4) (7^ (wi + 712 ) <Sr' 

has the F-distribution with 2 and (27ii + 27i2 — 4) degrees of freedom. This 
fact yields us a significance test for the hypothesis that both the means of the 
X variates and the means of the y variates for the two populations are the same. 
We can also set up confidence regions for — H2 and vi — V2 . 

Now let us consider a sample from a normal bivariate population with means 
M and V, variances ul and <rl and correlation coefficient p. Suppose 7 = al/al 
and p are known. The transformation 

(8) ^_VT±ij^jwrz>n\ 

V2 

« - Vl + pa:' - Vi - py' 

y/2 y 

gives us the variates x' and y' which are distributed independently and with 
variance ol . Applying the results above we see that. 


R 


n(n — 1) 


{£' - + {§' - ■>')» 

(a:< - *0* + Z (y< - y')* 

t-l t-1 


n(n - 1) 

£ (Xi 


(f — nf — 2fi\/y {£ — n){g — v) 4- y{9 — vf 
- £)* - 2pVy £)(yi - p) + 7 E (y. - 9) 

t-1 t-1 


2 


(9) 
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has the F-distribution with 2 and 2 r — 2 degrees of freedom. From this we 
derive significance tests, fiducial radii, and confidence regions as before. 

The above distributions, significance tests, and confidence regions are easily 
generalized to multivariate normal distributions. Suppose we have a sample of 
n A-tuples of observations {xia} (i = 1, 2, • • • , A:; a = 1, 2, • • • , w) from a k- 
variate normal distribution. Let the expected value of each variate be zero 
(i = 1, 2, • • • , fc), the variance of each variate be and each correlation co- 
efficient be zero. Then 

k 

n(n — 1) 2 

(10) R" = 

Z 2 (*<« - Xif 

has the F-distribution with fc and k{n — 1) degrees of freedom. Significance 
tests, confidence regions, and fiducial radii follow from this fact. 

3. Linear Regression. If one has a sample of n pairs of observations (xi , yi ; 
^^ 2 , 2/2 ; • • • ; Xn , 2/n) from a normal bivariate population and wishes to fit a 
straight line to the scatter of sample points, one fits the line in such a way that 
the sum of squares of distances from the sample points to the line is a minimum 
(‘‘error in both variates’^. 

It is easily shown that this line goes through the point whose coordinates are 
the sample means (f , y,) If the slope of a line through (x, 2 ?) is tan d, the dis- 
tance from a sample point (x< , yi) to the line is (x,- — x) sin 0 — {yi — y) cos 
6. The sum of squares of distances from sample points to the line is 

sin* Sx — 2 sin d cos 0 S^y + cos* 0 Sy , 

where 

n 

Sxy = (x» £){yi y)» 

If we minimize the above expression with respect to 0 we find 

(11) b = tan 0 = "" *S>x) + 45xy ^ 

2Sxy 

Using the plus sign gives us Sp , the minimum sum of squared distances; using 
the minus sign gives us Sa , the maximum sum of squared distances. (The latter 
value of tan 0 is the negative reciprocal of the former.) 

Sp is the sum of squared distances perpendicular to the regression line and 
Sa is the sum of squared distances along the regression line. The sum Sp + Sa 
is equal to S^ + Sy which is the sum of squares of distances from the sample 
points to the point f , 2 /. We have thus decomposed Sx + Sy into two compon- 
ents, one perpendicular to the regression line and the other along the regression 
line. 



SIGNIFICANCE TESTS 


145 


The joint distribution of Sp and Sa may be derived from the Wishart distribu- 
tion of the sums of squares and cross products/ 


(12) 


1 


4^^2n-2r(n - 2) 
Let us make the transformation 


SxSxv 
Sxy Sy 


4(n-4) 


-HSx+Sy)l9* 


Sx = COS* d Sa + sin* d Sp , 

Sy = sin* $ Sa + cos* $ Sp , 

Sxy = sin e cos $ {Sa — Sp). 

The value of d corresponds to the plus sign in (11). We find 


Sx + Sy 

Sx Sx, 
Sx. Sx 


Sp + Sa y 
SpSa. 


The Jacobian of the transformation is (S* — Sp). Using these relations in (12) 
and integrating out B we derive the distribution of Sa and Sp 


(13) 


4<r«r(n 


1 /SoSpV^"'^\~i(««+5p)/r* /cf Cf \ 

r=^)V^/ " (Sa-Sp). 


It can be shown that Sa and Sp are the characteristic roots of the sample vari- 
ance-covariance matrix. The distribution (13) of the characteristic roots of a 
variance-covariance matrix when the population correlation coefficient is zero 
and the variances are equal has been demonstrated by P. L. Hsu [4]. 

As a test of correlation (i.e., test of significance of the regression coefficient) 
we propose using the ratio 


F' = Sa/Sp . 

This ratio is the maximum ratio of the sum of squared deviations in one direction 
to the sum of squared deviations in the perpendicular direction. It is intuitively 
evident that this ratio is probably near unity if the null h>q)othesis is true, that 
is, if the variances are equal and the correlation is zero. If the correlation 
is not zero then the ratio is likely to be large. 

From (13) we can deduce the distribution of F' by transforming variables 
and integrating out the extraneous one. This procedure yields us as the dis- 
tribution of F' 


(n - 2)2**-^F'*^'*“^^(F' + 1)-"^"“'>(F' - 1). 


If we make the transformation 


^ This distribution is equivalent to Fisher’s distribution of the sample variances and 
correlation coefficient when the population correlation coefficient is zero. 
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we find the probability element of a' to be 

(n — 2) (cosh d(cosh z') 

After integrating we see the cumulative distribution of z' is 

1 - (cosh 

Critical values of z' for various levels of significance may be determined from a 
table of hyperbolic cosines. Table I gives some values of z' and the corre- 
sponding values of F' . 


TABLE I 

Percentage points for the z' (or F') distribution 


n 


P' 


P,n 


Pm 

Pm 

P.m 

Pm 

i*.io 

Pm 

P.M 

P.Oil 

3 

2.292 

2.993 

3.688 

5.298 



398 



4,000,000 

4 

1.444 

1.818 

2.178 

2.993 

4.144 

17.9 

38.0 


398 

4,000 

5 



1.656 

2.216 

2.993 


16.5 

27.4 

84.2 

398 

6 

.958 

1.178 

1.381 

1.818 

2.412 

6.79 

10.6 

15.8 


124 

■ 

.846 


1.207 

1.572 


5.43 

7.92 

11.2 

23.2 

61.4 

K 

.766 

.933 



1.818 

4.63 

6.47 

8.74 

16.5 

38.0 

K 


.856 

.992 

1.276 

1.643 


5.55 

7.28 

12.7 

26.8 

R 

.656 

.796 

.920 

1.178 


3.71 

4.91 

6.30 


20.5 

■ 1 

.616 

.746 

.862 



3.43 

4.45 

5.61 


16.5 

K ' 

.583 


.813 


1.314 

3.21 

4.10 


7.92 

13.9 

■ ! 

.554 


.772 

Kii 

1.241 


3.82 

4.68 


12.0 

■ [ 


.639 

.736 

.933 

1.178 


3.59 

4.36 

6.47 

10.6 

15 


.613 


.892 

1.124 

2.76 

3.41 



9.47 

20 

.429 

.517 


.746 


2.36 

2.81 

3.27 

4.46 

6.47 

25 

.378 

.455 

.522 

.654 

.814 

2.13 

2.48 

2.84 


5.10 


.342 

.411 

.471 

.589 

.732 

1.98 

2.28 

2.57 

3.25 

4.32 


.293 

.352 


.502 

.621 



2.23 

2.73 

3.47 


.237 

.284 

.324 


.498 

1.61 

1.76 

1.91 

2.24 

2.71 

120 

.165 

.198 

.226 

.281 

.345 

1.39 

1.49 

1.57 

1.75 

2.00 


The use of F' has been suggested Here to test the hypothesis that the popula- 
tion correlation coefficient is zero when it is known that the variances of the two 
variates are the same, or, more generally, when the ratio of the two variances is 
known. This gives a test of significance of the regression coefficient when there 
is error in both variates if the ratio of the variances is known. The test arises 
from intuitive considerations. F' can also be used to test the hypothesis that 
p a 0 and ol = ol {Hi in Hsu’s paper). C. T. Hsu [5] and J. W. Mauchly [6] 
have shown that the likelihood ratio criterion for this hypothesis is 


X 


r2(S.5, - Sl,)T _ r ■|‘" 
L (5.+ S,)* J + * 
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If we set the normal distribution function equal to a constant, we determine 
a contour ellipse in the x,y — plane. Since these ellipses of constant probability 
density are circles when p = 0 and al = trl , Mauchly calls the test a test of circu- 
larity, The same procedure as used to test whether these ellipses are circles can 
be used to test whether the ellipses have major axes in a certain direction and 
with a specified ratio of lengths of axes. Suppose we wish to test the hypothesis 
that the major axis is inclined to the x axis at an angle 0 and that the ratio of 
lengths of the major axis to the minor axis is k. This is equivalent to the hy- 
pothesis that p — pd and a\ = yoffj . To do this we rotate coordinate axes of the 
variables of the distribution (hence changing coordinates of all sample points)' 
through 6 and change the scale of one of the new variables by the factor of 
The transformation is 


X = kx' cos — y' sin 0, 


y — kx' sin 0 + y' cos 0. 

In terms of x', y' the null hypothesis is p' = 0, al' = cl' , and one proceeds as' 
above. Of course, if 70 is known then this method can be used to test the null 
hypothesis that p = po . 

4. Illustrative Example. An application of the formulae given above may be 
illustrated from the data in Table II, which gives two sets of electrical conductiv- 
ity measurements at different field strengths. The assumption that the two 
variances are equal is thus reasonable. 

Table of Pairs of Observations of Electrical Conductivity 


Xi 

y< 

Xi 

Vi 

5.0 

5,1 

5.5 

5.1 

7.4 

7.0 

5.3 

5.0 

7.0 

7.7 

4.7 

4.4 

8.8 

7.7 

8.6 

7.1 

7.8 

6.8 

7.5 

7.3 

5.1 

5.5 

5.6 

6.3 

6.6 

7.4 

7.4 

6.5 

8.8 

7.7 




Is it reasonable to regard x and y as being independently distributed in the 
population on the basis of these data? 

The sums of squares and cross products of deviations from the means and the 
calculated slope are: 


S, = 29.40, = 19.99, 

S, = 18.04, b = 0.7654. 
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The maximized variance ratio is: 


p, _ 5. + 2bS^ + b*Sy _ 69.89 _ 
i>*S. - 4.616 

= Jlnr = 1.36. 

Comparing with Table I for n = 15 we find this value of «' very highly sig- 
nificant (probability less than 0.001), and at this probability level and on basis 
of our data, x and y cannot be considered to be independent in the population. 

Since the regression is significant, it becomes of interest to compute the calcu- 
lated points Xi and F,- which fall on the regression line 

F = 1.35 + 0.7554 X, 

corresponding to each observed point Xi, yt. They are obtained from these 
equations 

Yi — V (a^i — f) + j _j_ iVi “ 5) 

= .481x. + .363y< -h .86, 


Xi = x + 


1 


1 + 6 * 


(xi — £) + 


1 + 6* 


(y< - 9) 


= .637x, + .481yi - .65. 

The minimized sum of squared deviations from the regression line (i.e., squared 
distances between observed and calculated points) is the denominator of the 
expression for F' divided by the factor (1 + 6*), 

4.615/.5706 = 2.64. 

It should perhaps be pointed out that the tests of the means described in the first 
part of this paper are no longer applicable since we do not know the population 
correlation coefficient. 
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SYMMETRIC TESTS OF THE HYPOTHESIS THAT THE MEAN OF 
ONE NORMAL POPULATION EXCEEDS THAT OF ANOTHER 

Bt Herbert A. Simon 
Illinois Intitule of Technology 

1 . Introduction. One of the most commonly recurring statistical problems is 
to determine, on the basis of statistical evidence, which of two samples, drawn 
from different universes, came from the universe with the larger mean value of a 
particular variate. Let My be the mean value which would be obtained with 
universe (Y) and M, be the mean value which would be obtained with universe 
(X). Then a test may be constructed for the hypothesis* My > Mr. 

If xi , • • • , X, are the observed values of the variate obtained from universe 
(X), and , • • • , j/„ are the observed values obtained from universe (F), then 
the sample space of the points (xi , • • • , x„ ; yi , • • • , y,) may be divided into 
three regions wo , wi , and on . If the sample point falls in the region uo , the 
hypothesis My > M* is accepted; if the sample point falls in the region wi , the 
hypothesis My > My is rejected; if the sample point falls in the region ott , 
judgment is withheld on the hypothesis. Regions wo , wi , and on are mutually 
exclusive and, together, fill the entire sample space. Any such set of regions 
on, 0)1 , and on defines a test for the hypothesis My > ilf , . 

In those cases, then, where the experimental results fall in the region wj , the 
test leads to the conclusion that there is need for additional data to establish a 
result beyond reasonable doubt. Under these conditions, the test does not 
afford any guide to an unavoidable or non-postponable choice. In the applica- 
tion of statistical findings to practical problems it often happens, however, that 
judgment can not be held in abeyance — that some choice must be made, even at 
a risk of error. For example, when planting time comes, a choice must be made 
between varieties (X) and (F) of grain even if neither has been conclusively 
demonstrated, up to that time, to yield a larger crop thhn the other. It is the 
purpose of this paper to propose a criterion which will always permit a choice 
between two experimental results, that is, a test in which the regions <1)4 and o)i 
fill the entire sample space. In the absence of a region ci>2 , any observed result 
is interpreted as a definite acceptance or rejection of the hypothesis tested. 

2 . General characteristics of the criterion. Let us designate the hypothesis 
My > Mx as Ho and the hypothesis Mg > My as Hi . Then a pair of tests. To 
and Ti , for Ho and Hi respectively must, to suit our needs, have the following 
properties: 

(1) The regions woo (<000 is the region of acceptance for Ho , uio the region of 
rejection for Ho ; and «ii the corresponding regions for Hi) and wu must 

> This paper presupposes a familiarity with the theory of testing statistical hypotheses as 
set forth by J. Neyman and E. S. Fearson [1]. 
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coincide; as must the regions ww and woi . This correspondence means that when 
Hq is accepted, Hi is rejected, and vice versa. Hence, the tests To and Ti are 
identical, and we shall hereafter refer only to the former. 

(2) There must be no regions wm and W 2 i . This means that judgment is never 
held in abeyance, no matter what sample is observed. 

(3) The regions a>oo and ww must be so bounded that the probability of accept- 
ing Hi when is true (error of the first kind for Tq) and the probability of 
accepting when Hi is true (error of the second kind for To) are, in a certain 
sense, minimized. Since Hq and Hi are composite hypotheses, the probability 
that a test will accept Hi when Hq is true depends upon which of the simple 
hypotheses that make up Hq is true. 

Neyman and Pearson [2] have proposed that a test, Ta for a hypothesis be 
termed uniformly more powerful than another test, , if the probability for 
of accepting the hypothesis if it is false, or the probability of rejecting it if it is 
true, does not exceed the corresponding probability for no matter which of 
the simple hypotheses is actually true. Since there is no test which is uniformly 
more powerful than all other possible tests, it is usually required that a test be 
uniformly most powerful (UMP) among the members of some specified class 
of tests. 


3. A symmetric test when the two universes have equal standard devia- 
tions. Let us consider, first, the hypothesis My > Mx where the universes from 
which observations of varieties (X) and (V), respectively, are drawn are nor- 
mally distributed universes with equal standard deviations, cr, and means M* and 
Af/respectively. Let us suppose a sample drawn of n random observations from 
the universe of variety (X) and a sample of n independent and random observa- 
tions from the universe of (F). The probability distribution of points in the 
sample space is given by 


(1) p(*i, 


t/i, 


, Vn) = (2»r<r®)' 




In testing the hypothesis > M, , there is a certain symmetry between the 
alternatives (X) and (F). If there is no o priori reason for choosing (X) rather 
than (K), and if the sample point Ei'. (oi , • • • , o„; 6i , • • • ,6*) falls in the region 
of acceptance of Hy. then the point Eti (6i , • • • , 6. ; ai , • • • , a») should fall in 
the region of acceptance of Hi . That is, if Ei is taken as evidence that My > Mx’, 
then Ei can with equal plausibility be taken as evidence that M, > My . 

Any test such that Ei’.'iai , • • • , a« ; hi , • • • , 6.) lies in (i)o whenever Ei'. 
{bi , • • • ,bn ;ai , “ ■ ,a„) lies in «i and vice versa, will be designated a symmetric 
test of the hypothesis My > Mx . Let 0 be the class of symmetric tests of Ho . 
If Ta is a member of fl, and is uniformly more powerful than every other Tfi 
. which is a member of Q, then Ta is the uniformly most powerful symmetric test of Ho. 

The hypothesis My > Mx possesses a UMP symmetric test. This may be 
shown as follows. From (1), the ratio can be calculated between the proba* 
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bility densities at the sample points ^:(xi , • • • , x, ; , • • • , y*) and E'x 

(yi, , Vn ; Xi, ‘ , x„). We get 

(2) = exp - y)(M, - M,)|, 

where 

n i n i 

Now the condition p{E) > p(£?0 is equivalent to (f — y){Mx — My) > 0‘ 

Hence ^{E) > p{E') whenever {x — y) has the same sign as (ilf* — My), 

Now for any symmetric test, if E lies in wo , E' lies in wi , and vice versa. 
Suppose that, in fact. My > Mg . Consider a symmetric test, r« whose region 
0)0 contains a sub-region c^ou (of measure greater than zero) such that y < x 
for every point in that sub-region. Then for every point E' in o)oc; , piE*) < 
p(E), Hence, a more powerful test, could be constructed which would be 
identical with Ta , except that (aw , the sub-region symmetric to (aw , would be 
interchanged with (am as a portion of the region of acceptance for Hq , There- 
fore, a test such that cjq contained all points for which y > x, and no others, 
would be a UMP symmetric test. This result is independent of the magnitude 
of (Mg — My) provided only My > Mg . We conclude that y > xis a uniformly 
most powerful symmetric test for the hypothesis My > Mg . 

The probability of committing an error with the UMP symmetric test is a 
simple function of the difference | My — Mg |. The exact value can be found 
by integrating (1) over the whole region of the sample space for which y < x. 
There is no need to distinguish errors of the first and second kind, siiice an error of 
the first kind with To is an error of the second kind with Ti , and vice versa. 
The probability of an error is one half when Mg = My , and in all other cases is 
less than one half. 

4. Relation of UMP symmetric test and test which is UMP of tests abso- 
lutely equivalent to it. Neyman and Pearson [2] have shown the test y — x > fc 
to be UMP among the tests absolutely equivalent to it, for the hypothesis 
My > Mg . They have defined a class of tests as absolutely equivalent if, for 
each simple hypothesis in Ho , the probability of an error of the first kind is 
exactly the same for all the tests which are members of the class. If A; be set 
equal to zero, y > x, and their test reduces to the UMP symmetric test. What is 
the relation between these two classes of tests? 

If Ta be the UMP symmetric test, then it is clear from Section 2 that there is 
no other symmetric test, T^ , which is absolutely equivalent to Ta . Hence 12, 
the class of symmetric tests, and A, the class of tests aboslutely equivalent to 
Ta , have only one member in common — ^the test Ta itself. Neyman and 
Pearson have shown Ta to be the UMP test of A, while the results of Section 4 
show to be the UMP test of 12. 
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6. Justification for employing a symmetric test In introducing Section 3, a 
heuristic argument was advanced for the use of a symmetric, rather than an 
asymmetric test for the hypothesis My > M, . This argument will now be given 
a precise interpretation in terms of probabilities. 

Assume, not a single experiment for testing the hypothesis My > Af, , but a 
series of similar experiments. Suppose a judgment to be formed independently 
on the basis of each experiment as to the correctness of the hypothesis. Is 
there any test which, if applied to the evidence in each case, will maximize the 
probability of a correct judgment in that experiment? Such a test can be shown 
to exist, providing one further assumption is made : that if any criterion be applied 
prior to the experiment to test the hypothesis My > Af* , the probability of a 
correct decision will be one half. That is, it must be assumed that there is no 
evidence which, prior to the experiment, will permit the variety with the greater 
yield to be selected with greater-than-chance frequency. 

Consider now any asymmetric test for the h 3 rpothesis — that is, any test 
which is not symmetric. The criterion y — x > ky where fc > 0, is an example 
of such a test. Unlike a symmetric test, an asymmetric test may give a different 
result if applied as a test of the hypothesis Ho than if applied as a test of the 
hypothesis Hi . For instance, a sample point such that y — x = €, where k > 
€ > 0, would be considered a rejection of Ho and acceptance of Hi if the above 
test were applied to Ho ; but would be considered a rejection of Hi and an ac- 
ceptance of Ho if the test were applied to Hi . Hence, before an asymmetric 
test can be applied to a problem of dichotomous choice — a problem where Ho or 
Hi must be determinately selected — a decision must be reached as to whether the 
test is to be applied to Ho or to Hi . This decision cannot be based upon the 
evidence of the sample to be tested — for in this case, the complete test, which 
would of course include this preliminary decision, would be symmetric by def- 
inition. 

Let He be the correct hypothesis {Ho or Hi , as the case may be) and let H* 
be the hypothesis to which the asymmetric test is applied. Since by assumption 
there is no prior evidence for deciding whether He is Ho or Hi , we may employ 
any random process for deciding whether H* is to be identified with Ho or Hi . 
If such a random selection is made, it follows that the probability that He and 
H* are identical is one half. 

We designate as the region of asymmetry of a test the region of points Hi: 
(ai ,•••, On ; 6i ,•••, bn) and H 2 : (bi , • • • , bn ; Oi , • • • , an) of aggregate measure 
greater than zero such that Ei and Et both fall in coo or both fall in «i . Suppose 
(oot, and mh are a particular symmetrically disposed pair of subregions of the 
region of asymmetry, which fall in cdq of a test To . Suppose that, for every 
point, Hi , in a)on , 6 > d, and that and coob are of measure greater than zero. 
The sum of the probabilities that the sample point will fall in a>oa or <uob is exactly 
the same whether He and H^ are the same hypothesis or are contradictory 
hypotheses. In the first case He will be accepted, in the second case He will be 
rejected. These two cases are of equal probability, hence there is a probability 
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of one half of accepting or rejecting He if the sample point falls in the region of 
asymmetry of To . But from equation (2) of Section 2 above, we see that if the 
subregions woa and wob had been in a region of symmetry, and if wo* had been in 
0 ) 0 , the probability of accepting He would have been greater than the probability 
of rejecting He . 

Hence, if it is determined by random selection to which of a pair of hypotheses 
an asymmetric test is going to be applied, the probability of a correct judgment 
with the asymmetric test will be less than if there were substituted for it the 
UMP symmetric test. It may be concluded that the UMP symmetric test is to 
be preferred unless there is prior evidence which permits a tentative selection of 
the correct hypothesis with greater-than-chance frequency. 

6. S]rmmetric test when standard deviations of universes are unequal. 

Thus far, we have restricted ourselves to the case where = o-y . Let us now 
relax this condition and see whether a UMP symmetric test for My > Mx exists 
in this more general case. 

We now have for the ratio of p(E) to p{E*): 

(3) = exp [(<^1 - - M.) - 

where 

Atz ^ 23 My ~ 

% i 

Even if <ry and <r* are known, which is not usually the case, there is no UMP 
symmetric test for the hypothesis My > Mx . From (3), the symmetric critical 
region which has the lowest probability of errors of the first kind for the hy- 
pothesis {My = ki ; Mx = ki ]ki > k 2 ) is the set of points E such that: 

(4) {(Ty <Ts){lIx My) 2{<Tyk2 y) ^ 0. 

Since this region is not the same for all values of ki and k 2 such that ki > k 2 , 
there is no UMP symmetric region for the composite hypothesis My > Mx . 
This result holds, a fortiori when Oy and a, are not known. 

If there is no UMP synunetric test for My > Mx when ay ax , we must be 
satisfied with a test which is UMP among some class of tests more restricted than 
the class of symmetric tests. Let us continue to restrict outselves to the case 
where there are an equal number of observations, in our sample, of (X) and of 
(K). Let us pair the observations Xi , y,- , and consider the differences = 
Xi — . Is there a UMP test among the tests which are symmetric with 

respect to the u/s for the hypothesis that Afy — Af, = — U > 0? By a sym- 
metric test in this case we mean a test such that whenever the point (ui , • • • , Un) 
falls into region u)o , the point ( — Ui , • • • , —Un) falls into region wi . 

If Xi and j/i are distributed normally about Mx and My with standard devia- 
tions <r» and ay respectively, then w,- will be normally distributed about U = 
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ilf , — My with standard deviation = y/<r\ + <t\ . The ratio of probabilities 
for the sample points (ui , ^ , tin) and (— tii , • • • , —tin) is given by: 



where 


ti = - 
n i 

Hence, piE^) > p(E[) whenever u has the same sign as U, Therefore, by the 
same process of reasoning as in Section 2, above, we may show that ti < 0 is a 
UMP test among tests symmetric in the sample space of the w's for the hypothe- 
sis U < 0. 

It should be emphasized that , the class of symmetric regions in the space 
of E^: (til • • • tin), is far more restricted than 0, , the class of symmetric regions 
in the sample space of E: (xi • • • Xn ; l/i • • • yn). In the latter class are included 
all regions such that: 

(A) (ai , • • • , On ; 6i , • • • , bn) fallsin «o whenever £?: (6i , • • • , 6n ; Oi , • • • , On) 
falls in 0 ) 1 . Members of cl^ Hmu satisfy this condition together with the 
further condition; 

(B) For all possible sets of n constants , • • • ,kn t E:{xi + ki ^ • • • , Xn + in ; 
Vi + ki, ••• ,yn + kn) falls in «o whenever (xi , • • • , Xn ; yi , • • • , yn) falls 
in Wo . When cry <r, , a UMP test for My > Mx with respect to the symmetric 
class Q,u exists, but a UMP test with respect to th6 symmetric class Q, does not 
exist. 
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ON INDICES OF DISPERSION 


By Paul G. Hoel 
University of California, Los Angeles 

1. Introduction. In biological sciences the index of dispersion for the binomial 
and Poisson distributions is very useful for testing homogeneity of certain types 
of data. For example, the dilution technique in making blood counts finds it 
useful. Recently there have been attempts to use it to determine allergies by 
observing the change in the blood count after allergic foods have been taken. 
Here the sample may consist of only a few readings; consequently it is important 
to know how accurate this index is when applied to small samples. After in- 
specting the application of the Poisson index to such counts, I was surprised to 
see the lack of agreement with theory. At first it appeared that the fault lay 
with the chi-square approximation which is used on this index, but later it was 
clear that the assumption of a basic Poisson distribution was at fault. It now 
appears that statisticians will need to be careful about citing blood counts as 
examples of data following a Poisson distribution. 

This paper is the result of investigating the accuracy of the chi-square ap- 
proximation for the distribution of these indices. Previous work on this problem 
sehms to have consisted in some sampling experiments [1] for small values of 
the parameters involved, and in some theoretical work [2] in which the sampling 
distribution is considered only for a fixed sample mean. Although sampling 
distributions ordinarily differ very little from the distributions obtained by 
assuming the mean of the sample fixed, for small degrees of freedom the dif- 
ference may be appreciable and therefore requires investigation. In this paper 
the accuracy of the chi-square approximation is investigated by finding expres- 
sions for the descriptive moments of the distribution which are correct to terms 
of order N~*. These expressions are obtained by means of Fisher's semi-in- 
variant technique. 


2. Moments of the distribution. Emplo}dng Fisher’s notation [3], let the 
binomial index of dispersion be denoted by z, then z may be written as: 

2(g - iy {N - l)fa N - 1 h 

' %(l - 0 (. - i-) ■ (l - ±) (. + 


Letting to — fci — Ki,y*fci, a-n — 6 


N - 1 

•■(' - i) 


, z may be ex- 


IW 
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panded as follows: 




= b,\i + „(i - i)+ «^(i - ± +^) + ...} 

== 5{l/ + civay -r ctw^y + c»w*y + •••}, 


where the definition of c< is obvious. As will be seen later, these expansions are 
valid for obtaining the expected values of powers of z; hence 

E(z) = b j/ioi + Ci/iii + Ciftti + • • • } 

E(Z^) — 5* {jl02 + 2 CiM 12 + (2C2 + Ci)>i22 + (2C3 + 2CiCl)ntt + • • • } 

( 1 ) 

E(z^) = 5* {Aka + Sci/uis + (Scj + 3ci)juj3 + (Sci + GcjCi + Ct)A»j3 + • • • } 


E(z*) = b* I Moi + 4ciMii + (4c2 + 6c!)mu + (4c, + 12cci + 4c!)mu +•••}. 

Since only the first four moments of z are to be found, it will be necessary to 
evaluate the a*.-,' for j = 1, 2, 3, 4 and for f = 0, 1, 2, • • • as far as necessary to 
give the desired degree of accuracy. 

First consider the relation between the moments /<>/ and the semi-invariants 
Kif which are defined in terms of the mu by the following formal identity in t and r. 


6 2! 


t I Miol + AkiiT . A*M<* + 2MiitT + AkaT* , 

= H n 1 ST r 


1! 


2! 


Differentiating both sides with respect to t and replacing the exponential factor 
by the right member gives an identity which is convenient for evaluating the 
Mio . Differentiating both sides with respect to r and making the same replace- 
ment gives an identity which is convenient for evaluating the mu for j > 0. 

These identities express mu ^ sum of products of k’s and m% each such 
product being of total degree i and j in its subscripts. By repeated substitution, 
MU can be expressed as a sum of products of x’s only. From Fisher’s formulas 
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each such semi-invariant, Krs , can be expressed as a sum of products of semi- 
invariants of the basic distribution, each term of which sum is of order 
in N, Hence it follows that the lowest order term, or at least one of the lowest 
order terms, in N in the expression for /Xt/ will be a term with the maximum 
number of k factors. Since the Kr, of lowest degree in subscripts are kiq and koi , 
the term with the maximum number of k factors will be the term in kiqkoi . 
However, since w = h — ki has a zero mean value, /xio == kio = 0; consequently 
the lowest degree term involving the subscript i > 0 is K20 or kh , As a result, 
the maximum number of k factors will be found in the term containing kIokIi 
fbr i even and Kjo* ”*01 for i odd. These terms are of order N and 
respectively. Since it is desired to obtain accuracy of order N~^, it therefore 
will suffice to evaluate mo for i < 6. 

The validity of the expansions used in arriving at (1) could now be shown by 
writing them as partial sums with remainder terms and then showing that the 
remainder terms are of higher order than 

Neglecting terms of higher order than N~^, the above identities give the follow- 
ing expressions for mu for j = 0, 1 , 2 and i = 0, 1 , • • • , 6, with slightly longer 
expressions for j = 3 and 4. 


Mio = 0 

Hoi = Koi 

M20 = 

hi} = icn 

M30 = ^30 

h21 = K21 + K0l/i20 

M40 = + 3k20M20 

hsi = 3lCii/i20 + Kolhzo 

/iso = fi^30Al20 + 4K20/i80 

H4l = 6K2l/i20 “T 4lCii/i30 + K 01 /X 4 O 

/iso = 5if20/i40 

M61 = 5#fii/i40 + ICOlhbo 


/isi = Koiheo 

/i02 = <^02 + iCOl/iOl 


^lli = Ki 2 + Knm + Koi/ill 


/i22 = IC22 + K2l/i01 + K02/i20 

+ 2Kll/iii + ICoi/i21 

/i32 = Ksifloi + 3ki2I12Q + 3x2i/iil + IC02/i80 + 3iCli/i21 + i^Ol/i^l 

/i42 = 6lC2l/i21 + if02/i40 + 4lCii/i3i + KoiMSl 

MU = 5kiiM 41 + KolMH 


MU = *oiM«i . 



The next step is to apply Fisher’s formulas expressing the in terms of the 
semi-invariants of the basic variable distribution, which in this case is the bi- 
nomial distribution. In Fisher’s notation would be written as since 

the variables w and y are respectively A;i , measured from its expected value, and 
ki . Applying such formulas, the following expressions for the Ma and mm arc 
obtained, with somewhat longer expressions for the mm and mm • 
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m = 




MSI 


2 

=s 4- 


«C6 , 4lC|fC2 


^ 7#C4<CS j_ 4iC8 _j_ 3k2 

" Iv^ 


25kj kI 
““ jyrs > 


M6i = 


15ici 

iV»' 


( 2 ) 


Moi 


Ml2 


_ jt* 1 2K»Kt r 

N L 


N - 1 


] 


/*« = + +3] + + i] + ^[j^i + 1] 


^ 5kbIC2 _l 7iC4K3 _j_ 7ksK2 

^ ~ 'W^ ~Jp ~w 


_ I61C41C2 _ j _ 20 «»K2 _^_ Zk - 

^ W m ■*■ 77* 

40ic» k \ 


[r?n 


m = 


w* = 


Ni 

154 

m ’ 


It is necessary to express these #c’s in terms of the parameters of the binomial 
distribution. Here the ks are defined by the following formal identity in 0, 

$9 41 

«l^+«J57+«t£7+-* • / , Bvn 

€ ** = (g + pe ) . 

Taking logarithms, expanding in powers of and equating coefficients of powers 
of 0, the following expressions are obtained: 


Ki = m 


K2 = Tnq 

Ki = mq(q - p) 

m = tnq{l — 6pq) 

Ki = niqiq - p)(l - 12pg) 

K« = mqiX — 30pq + 120pV) 

Ki - mq(q - p)(l - 6bpg + 360pV) 

ICS » mq(l - 126pg + 1680pV “ 6040pV)- 
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These values of the k’s are inserted in (2) to give the following expressions for 
the /iii and /mi , with considerably longer expressions for the ft,s and ; 

fiot == mq 


Hn = mg (g - p) ^ 


A 

m = mg I - 
A»8i = mg(g 


— 6pg , mg 

If 


) 


- -n A - 12pg . 4mg \ 


2 2 

M4i = m 5 




— 58pg , 3mg 

'JP 


) 


Uti = m*g*(g - p) 


4 4 15 


JV» 


A1Q2 


2mq 


/ 1 - 6p^ 

M12 = mg(g - p)^i “ 


+ ^^+mg 


) 


m 


Ita 


= mg^- 


- 30pg + 120p*g* 


4?ng 2mg 

Ar(Ar - 1) IT 

8mg(l — 5pg) 


) 


jV* 


+ 


= m*g*(g 


M42 


3 8 I » 

= m g ^ 


- P)^ 


12 — 102pg 




+ 


JV*(iV - 1) 

^ mg(5 - 26pg) ^ 

7mq \ 
AT*/ 


2 »* 


2 m g 


iV(iV - 1) 


+ 


m’g 

Iv 


0 


14mg 


Ar *( iv - 1 ) 


36 - 176pg 


N> 


+ 


6mg 


+ 


3mg 


Ar*(Ar - 1) ' AT* 


) 


4 4 , ^ 

M62 = m g (g - p) 

5 { 15 

P** • 

It remains to express the coefficients of (1) in terms of these same parameters. 
From the definition of Ci , a, and ki , it follows that 

.♦■+1 


Ci 




a Ki 


p»^» + (-i )*g' 
m*q* 


• -<+» 
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If now the above values of the m*; and c. are inserted in the expressions (1), the 
following final formulas are obtained. 

+ £ + (£)’ + (£)•+...} 

E(z') = W - l)’/l + ~ ^ 


(N — l)Nmq 


+ 


(Nmq)* 


_ 2(1 + 2pq- 25p*9*) 2pg(H- 3pg - 30p*<?^) \ 

(AT - iKNmqy (ATmg)® -f-'-J 


£« = W-l)‘{.+^-^ + 


8 


6(1 -3pg) 
{N - D* (A' - l)NTnq 

24(1 - 5pq) 


, 2pq(l - 5pg) , 4(1 - 4pg)(Ar - 2) 

{NmqY (AT - l)Wmg (AT - l)Wmg 

_ 6(1 — llpg + 40p*g* , 6pg(l — 16pg + 55p*g*) 

/XT t\/XT \0 • /XT \« 


(3) 


(AT - iXATmg)* 

60pg(l - 4pg)(Ar - 2) '! 

• /XT 1\t/XT \0 I f 


(Nmqy 


(AT - l)*(A'»tg)* 

E{z*) = (AT - 1)* /l H — H ^ — 

^ \ ^ AT - 1 Nmq ^ (AT - 1)* (AT - 

48 


12(1 + 2pq) 


2pq(2 - 21pq) 16(1 - 4pg)(Ar - 2) , 

"• “ /XT t\oAr "• 


{NmqY 


(AT - lyNmq 


i)Nmq 

8(15 - 46pg) 
(AT - ly (N - lyNmq 


_ 12(3 - 44pg + 138p*g*) 64pg(l - 4pg)(Ar - 2) 

(N - iXNmqy (AT - l)*(Ar»wg)*" 

96(1 - 4pg)(Ar - 2) 8(1 - 12pg + 36p*g*)(4A/'* - QAT + 6) 

• /XT t\9 XT • 


(AT — l)W»ig 

, 4p3(l - 43pg + 168p*g*) 
(Nmqy 


(AT - lyiNmqy 


By considering the formation of terms, it can also be shown that the above 
expressions are correct to terms of order m*, m*, m\ and m°, respectively, in the 
parameter m. If m is large these expressions are considerably more accurate 
than the order would indicate since the lowest order terms neglected in these 
expressions are respectively N^rn^y N^w?y N%^y and N^m. 


3. Applications. To compare these moments with those of the chi-square 
distribution, consider the ratios of corresponding moments, both for the Poisson 
distribution and for the binomial distribution in the special case of p = 
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For the Poisson distribution, these ratios are 
Ri — 1 


Ri = 1 


1 _ 1 
Nm {Nmy 

1 4 


^ 2m Nm 


= 1 + 


/1 + J_- 

1 3m* 


iV + 3 \m 3m* Nm) * 

For the binomial distribution with p = i, these ratios are 

Rl = 1 + -rp + 7 1 ^ + 7\T 

Nn {Nny (iVn)* 

“ n)(^ '^mn~ ~ 4(W 

-J- J_ Ai - ^ 4. -u _1_ 4 

iV*n V 2n^ 2nV iV»n* V 2 2n/ 


5 

2iNny 


J^h 

N + 3\n 


2N*n^ ’ 


From these expressions the following table is constructed. 


m 

n 

AT 

Rl 

Rz 

Rj 

Ri 

25 

GO 

3 

1 

.99 

.97 

1.01 

25 

75 

3 

1 

1 

.98 

.97 

5 

00 

5 

1 

.96 

.94 

1.08 

5 

15 

5 

1.01 

.96 

.87 

.84 

2 

GO 

00 

1 

1 

1.25 

1 

2 

00 

10 

1 

.95 

1.05 

1.19 

2 

•<o 

5 

1 

.89 

.85 

1.21 

2 

6 

00 

1 

.83 

.59 

.69 

2 

6 

10 

1.02 

.87 

.64 

.64 

2 

6 

5 

1.03 

.90 

.69 

.62 

1 

00 

25 

1 

.96 

1.34 

1.22 

1 

00 

10 

1 

.89 

1.10 

1.39 

1 

00 

5 

1 

.76 

.70 

1.44 

1 

3 

25 

1.01 

.69 

.31 

.41 

1 

3 

10 

1.03 

.72 

.35 

.38 

1 

3 

5 

1.07 

.77 

.41 

.36 
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For m > 5 these ratios are close to unity even for N as small as 5; hence it 
appears that the chi-square approximation is satisfactory as long as m > 5. 

For m < 2 most of these ratios differ considerably from unity, particularly 
for the binomial distribution. Since Ri is practically constant, the reduction 
in /?2 here indicates that the chi-square approximation will contain too many 
extreme values. For the Poisson distribution there is an increase in Ri to com- 
pensate slightly for this decrease in R 2 so that the 5 percent points, for example, 
would not differ very much. The use of the chi-square approximation would 
therefore tend to give slightly too few significant results when they exist. For 
the binomial distribution, however, there is a decrease in both /f? and Ri , so 
that the distribution tends toward normality; consequently the chi-sejuare ap- 
proximation will contain far too many extreme values and the 5 percent point 
will be much too large. This situation becomes slightly worse with increasing N. 

4. Conclusions. From a consideration of the approximations for the first 
four moments of the distribution of the index of dispersion, it appears that the 
chi-square approximation is highly satisfactory provided that m > 5. For 
smaller values of m, the approximation is still fairly accurate for the Poisson 
distribution but not for the binomial distribution. For decreasing small values 
of m there is an increasing tendency to claim compatibility between data and 
theory when it does not exist; hence the binomial index must be handled care- 
fully in such situations. These general conclusions are in agreement with the 
specialized results of Cochran and Sukhatme. 

The semi-invariant technique for problems such as this is exceedingly laborious 
and* is of questionable accuracy. The coefficients in Fisher^s heavier formulas 
are so large that increased accuracy comes slowly with increased accuracy of 
order of terms. In addition, there are numerous typographical mistakes in 
Fisher's formulas, some of which are not easily detected. The formulas (3) 
may be used to investigate the accuracy of the chi-square approximation for 
situations not covered in the numerical table, but they are of questionable 
accuracy, when m is small, for N as small as 5. 
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ON SERIAL NUMBERS 


By E. J. Gumbbl 
New School for Social Research 

In this paper we consider a continuous variate and unclassified observations. 
It is well known that there are two step functions, which we may trace for a given 
series of observations. We will show that the differences between the two ways 
of plotting play an important r61e for certain graphical methods used by en- 
gineers. 

To obtain* one and only one series of observations we adjust the cumulative 
frequencies. The corrections thus introduced depend upon the theoretical dis- 
tribution which is adequate for the observations. Later we deal with the rela- 
tion between serial numbers and grades. Finally we construct confidence bands 
for the comparison between theory and observations. 

1. Theory and observations. If we arrange n observations in order of in- 
creasing magnitude, and write each as often as it occurs, there will be a first, 
Xi , the smallest value, a second, xi , an mth, Xm the penultimate, Xn -\ , and the 
last, Xn , i.e., the greatest value. The index m is called the observed cumulative 
frequency, or simply the rank. It is usual to draw the observations Xm along the 
abscissa, and the rank m along the ordinate. The step function starts with a 
vertical line from the value Xi of the* abscissa to the point with the coordinates 
1, Xi , and in general consists of the horizontal lines from the point m, to the 
point m, Xm+i and the vertical lines from the point m, Xm+i to the point m + 1, 
Xm-fi . The step function ends with the point n, Xn . We call this graph the 
step function (m, Xm). However, another step function which is derived from 
the observations arranged in decreasing magnitude is equally legitimate. This 
step function starts from the point with the coordinates 0, Xi , and in general 
consists of the horizontal lines from the point m — 1, x„ to m — 1, Xm+i and the 
vertical lines from the point m — 1, Xm+i to the point m, Xm+i and ends with the 
point n — 1, Xn . We call it the step function (m — 1, Xm). Let F(x) be the 
probability of a value equal to or less than x. Then the continuous theoretical 
curve, the ogive, which we compare to the step functions is nF(x), x. The ques- 
tion is whether we have to use the step function (m, Xm) or the step function 
(m — 1, Xn»). 

The differences between the two ways of plotting are rarely mentioned in the 
statistical literature. If we plot instead of the rank m the reduced rank m/n, 
the differences between the two ways of plotting are of the order 1/n . It is 
generally tacitly assumed that this difference may be neglected. This will not 
hold if n is small. 

Iti the following we show two other ways of plotting the observations where 
the differences between the two observed curves play an important role. Sup- 
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pose that the probability F{x) and the density of probability, f{x), henceforth 
called the distribution are such that it is possible to introduce a reduced variate 


( 1 ) 


z 


X — o 


b ^ 


which has no dimension. In general, the constant a will be a certain mean, and 
the constant b a certain measure of dispersion. Furthermore, the constants may 
be linear functions of these characteristics. Neither the probability G{z) of a 
value equal to or less than z 

(2) G(z) = Fix), 
nor the reduced distribution 

(3) giz) = bfia + bz) 

contain constants. The equiprobability teat consists in the following procedure: 
We attribute to the mth observation Xm the relative frequency m/n, and deter- 
mine from a probability table a value z, such that 

(4) Giz) = m/n. 

The variate x is plotted on the ordinate, and the reduced variate z on the ab- 
scissa. Then the points Xm , z must be situated close to the straight line (1). 
To apply this comparison between theory and observations, we need not even 
calculate the constants. For the normal distribution the application of this test 
is greatly facilitated by the use of probability paper. 

The difficulty is that we may as well choose the frequency 

(40 Giz) = (m - l)/n, 

and determine the corresponding values of z. Therefore, we have two lines (1) 
instead of one. The difference between the two series will be large for the 
first and last few observations. For the first series the last observation cannot 
be plotted on probability paper; for the second series the first observation can- 
not be plotted. 

The same difficulty exists for the “return period.” If the observations of a 
continuous variate are made at regular intervals in time which are taken as units, 
we may as in [4] define the theoretical return periods .r(x) of a value equal to or 
greater than x as 


(5) 


Tix) 


1 

1 - Fix) • 


The comparison of the theoretical with the observed return periods gives a test 
for the validity of a theory. However, there are two series of observations, 
namely, the exceedance intervals 
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(6) = m = l,2-..n-l 

n m 

and the recurrence intervals 

(7) "nx,) = - m=l,2...n. 

n — m - 1 “ 1 

The two expressions (6) and (7) differ widely for the high ranks. The penulti- 
mate observation, for example, has an exceedance interval n, whereas the recur- 
rence interval is only n/2. This contradiction and the difficulty arising for the 
equiprobability test show that the question of choosing the observed cumulative 
frequency of the mth observation has a practical significance. 

The equiprobability test and the comparison between the observed and the 
theoretical return period may be combined on probability paper. The variate x 
is plotted on the ordinate, the reduced variate y on the abscissa. But instead of 
y we write the probability F{x) and the return period T{x), If the theory holds, 
the observations must be scattered around the straight line (1). 

But all these methods presuppose that we know whether we have to attribute 
to Xm the rank m or the rank m — 1 . Sometimes a compromise has been pro- 
posed which consists in attributing to Xm neither m nor m — 1, but the arithmetic 
mean of both, m — In other words, the index m is no longer considered to be 
an integer. In such cases, we call m the serial number. 

The corrected frequency m — J may be accepted for the comparison between 
the step function and the probability curve. However, for the return period 
and for the equiprobability test this method leads to serious difficulties. The 
corrected return periods, which have been proposed by Hazen [7] and have been 
used by M. Kimball [8] are 


( 6 ) 


T(xJ = 


n 

n — w + 1/2 * 


The last among n observations has a return period 2n. This idea does not seem 
to be sound. No statistical device can increase the number of observations 
beyond n. 

2. The adjusted frequency of the mth observation. The use of m, m — 1, or 

m — i as frequency of the mth observations amounts to considering the mth 
value as being fixed. To obtain one and only one step function we consider Xm 
as a statistical variate. This will lead to the determination of the most probable 
serial number and of the corresponding probability as a function of m and n. 

The mth observation is such that there are m — 1 observations below it and 
n — m observations above it. Consequently, the distribution it;n(x,m) of the 
mth observation is 


(9) 


WnM = - F(x)]-7(x). 



166 


E. J. QUMBEL 


The variate Xm is simply called x as each value of x has a certain density of 
probability of being the mth. To distinguish between (x) and Wn{x,m), the first 
distribution is referred to as the initial distribution. For some simple initial 
distributions it is possible to calculate exactly the mean and the standard error 
of the distribution (9). This has been done by Karl Pearson [10] for the normal, 
the uniform, the exponential, and other skew distributions. The results are very 
complicated, and do not allow any immediate practical applications. 

In the following we determine therefore instead of the mean the mode of the 
mth value. The most probable mth value for which we write f ^ is the solution 
of 


d log Wn(x, m) ^ Q 
dx 


We obtain from (9) 


( 10 ) 


m — 1 


/(ij - 


n m 
1-F{ind 


fiij = - 


/(im) • 


In this equation m is counted in order of increasing magnitude. If we choose 
the inverse order we obtain the same equation, if we replace the index m by 
n — m + 1. Therefore the following results are independent of the order of 
counting m. 

Equation (10) gives the most probable value Xm as a function of m and n. 
The function depends upon the distribution. 

A rough, first trial solution of (10) may be obtained if we confine our interest 
to values where neither m nor n — m is sniall in comparison to n, that is, values 
which are not extreme. Suppose m to be of the order n/2. For increasing num- 
bers of observations, the expression on the left side of (10) become large com- 
pared to the right side provided the derivative remains finite, as is generally the 
case. If we neglect the right-hand member, Xm is the solution of 


( 11 ) 


F{Xn.) - 


m — 1 
n — 1 ’ 


This expression holds for the uniform distribution where /'(x) = 0. 

The following exact solution is valid for any number of observations and any 
serial number. Equation (10) will be used in two different ways: First, we sup- 
pose m to be known, we determine the probability F(3tm) of the most probable 
mth value as a function of m and n, and attribute this probability to the mth 
observation Xm . Py doing so, the probability of the most probable mth value 
becomes the adjusted frequency of the mth observation. This leads to one and 
only one series of observations, and settles our initial question. Later, in 
section 3, we suppose F(^m) to be known, and compute the corresponding most 
probable mth observation. This leads to an estimate of the grades (or partition 
values) from the serial numbers. 
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To obtain F{xm) from (10) we introduce an expression by stating 

(12) [<^\Xm)n] = F{Xm)[l - F(x^)ir*(Jm) . 

The brackets are meant to indicate that the product on the left side does not 
depend upon n. We shall show later that a^(xm) is under certain conditions 
the variance of the mth observation. For the present purpose however this 
significance is not required. Multiplication of (10) by (12) leads to 

(13) m - 1 + F{x^) - nF(f.) = -/'(x.)[(r^(x.)n], 


or 

(14) 


-j- ^ iXn^71i\ 

n — 1 n —• 1 


The adjusted frequency in (14) is similar to (11). Another expression for the 
adjusted frequency, derived from (13), is 

(15) + - (F{xJ - i +nx„)[a\x„)n]). 

n n 


The adjusted frequency is the compromise 



plus an expression 


(16) ^ = \ (F(i^) - i + rHJWHxM- 

n n 


The correction, 1), defined by (16) depends upon the initial distribution and has 
no dimension. In general, it will depend upon the constants which exist in the 
distribution. If the distribution f(x) may be written in a reduced form (3), 
the correction* 

(17) Z) = (?(z) - i + g'{z)W\z)n] 


depends only upon the dimensionless reduced variate z. For a given initial dis- 
tributlbn we choose numerical values for the probability G{z) = F(x„) calculate 
g'{z) and 


(18) 


[Az)n\ = 


Q(z)(l - G{z)) 


From (16) we compute a table for the corrections D as a function of the adjusted 
frequencies F(f «) and obtain for given n the serial number m as a function of 
the adjusted frequencies by 


(19) m = nF(i«) + i - Z). 

These serial numbers will not be integers. The adjusted frequency F(f») for 
the mth observation will then be obtained by linear interpolation. 

* In previous articles [3, 6] we started from another interpretation of the oorrected 
frequencies and obtained slightly different corrections. 
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The value and the sign of the correction D depends upon the distribution. 
For the asymmetrical exponential distribution, for example, the correction 

(190 D= -i, 

is independent of the variate. This means that we have to use exclusively the 
step function (m — 1, Xm) as being the best way of plotting. The observed 
adjusted return periods are the recurrence intervals. 

For a symmetrical reduced distribution we have 

(20) 1 - G(-z) = 0(2); g(- 2 ) = g(z); g'(-z) = -g'iz). 

Therefore, the reduced correction will be 

(21) D{-z) = -Diz). 

For the two reduced values z and —2 of a 83 Tnmetrical variate the corrections 
have the same size, but different signs. 

A relation similar to (21) holds for two asymmetrical reduced distributions 
gi{z) and ig{z), which are symmetrical one to another in the sense 

(22) Oi( 2 ) = 1 - iG(-z); gi(z) = ig(-z); g'liz) = -ig'{-z). 

Then, the corrections are 

(23) Dx(-2) = -.Z)(2). 

For any initial distribution f{x) we read from (19) the adjusted frequency 

(24) - n*.) = -s-nl-te, 


even for a small number of observations. The question whether to choose wi/n 
or (m — l)/n as observed cumulative frequency is settled by (24). We obtain 
one observed step function, one series for the equiprobability test, and one 
series of observed return periods 


(25) 


r(f«) = 


n 

n — lOT + i — D’ 


which have to be compared to the theoretical continuous curves. 


3. Estimates for the grades. In the following we use the fundamental 
formula (15) to determine interesting grades through the mth values. 

We use the term grade for the value of a statistical variate which corresponds 
to a given cumulative probability F{x) say, F(x) = for quartiles; F(x) = 

tAt* • • • a for deciles, and so on. For a given grade, the probability F{x) the 
density of probability /(*) and its derivative are known, and m is unknown. 
The value of tn obtained from (15), henceforth called the most probable serial 
number M, is the solution of 
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(26) m = nF{x) + 1 - F{x) - f'{x)F{x){l - F{x))r\x). 

The corresponding ‘‘observed’* value a:* is obtained by interpolation between 
two observed values Xm-i and x ^ , such that 

m — 1 < fn < m. 


For the median, Xo , the most probable serial number rho is 


(27) 


nio 


n + 1 




The median Xo itself enters into (27). It has to be eliminated through the condi- 
tion F{xo) = For the exponential distribution for example we find 


(27') nio = ^ + 1. 

The most probable serial number of the median for a symmetrical distribution is 

(28) mo = i(n + 1 ). 


This is the usual estimate of the median for any distribution. The estimate 
obtained from (27) is smaller (larger) than the usual estimate if the median is 
smaller (larger) than the mode. The difference between the two estimates is 
due to the fact, that (27) makes use of information about the theoretical distribu- 
tion whereas this information (if available) is neglected by the usual method. 

For symmetrical distributions the most probable serial numbers fhi and m 2 
for two symmetrical grades defined by Fi and F 2 = 1 — Fi are according to 
(26) related by 

mi = nFi + 1 — (Fi +/i^i(l "" Fi)fi^) 

(29) , 

m, = n(l - Fx) + (F, +/,F ,(1 - F,)/r*). 


The members in brackets have the same size, but opposite signs. Another ex- 
pression for nk is 

% = (n + 1) - [nFi + 1 - F, - /xFi(l - F.)/r*] 


80 that, for symmetrical distributions 

(30) ^1 + ^ = n + 1 . 


This is to be expected as the with value counted upwards is the (n — wi + l)st 
value counted downwards. 

For the two quartiles 91 and 91 the most probable serial numbers m( 9 i) and 
obtained from (29) are 


f31) 


^(?i) 


” + 3 3 f'jqi ) . 

4 16/*(9i) ’ 


m{qt) 


3n + 1 _ £/'(9t) 
4 16/*(9i)’ 


where 9 ! and 91 have to be eliminated by the use of 
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F(gi) = i;F(g,)»|. 

For the uniform, the normal and the exponential distribution we obtain the two 
quartiles from 

m(qi) = ; ffiiqt) = — ^ — 

(310 m(gi) = 5 + .352; m(g,) = ^ + .648 

4 4 

^(gi) = j + 1 ; ^(gi) “ ^ ^ 

respectively. The last result may also be found from (190 and (24). These 
estimates differ from the usual estimates by the reason given above. 

We now apply the notion of a grade to certain characteristics which are other- 
wise defined. A certain characteristic, say, the mode x or the mean i have for a 
given distribution the probabilities F{x) or Fi2) respectively. These probabili- 
ties may be used to define a grade. We determine the corresponding mth value 
from (26), and obtain an estimate of the mode or the mean, interpreted as grades 
by interpolation between the observed with values. For a symmetrical dis- 
tribution these estimates of the mode and mean are identical with the estimates 
of the median. For an asymmetrical distribution, the most probable serial 
number jh(i) of the mode becomes according to (26) 

(32) mix) = (n - l)F(i) -|- 1. 

Usually, the mode x of a continuous variate is estimated by another procedure. 
The observations are arranged in certain cells. One of them has the largest 
relative frequency. It will contain the mode. To find its position within the 
cell, an interpolation formula is applied which reproduces the content of this 
cell and of the two adjacent cells. By choosing different lengths for the cells 
and different origins for the classification, the mode can be shifted to the rifdit or 
to the left. Formula (32) furnishes a determination of the mode from the ob- 
servations according to the theory, such that the arrangement of the observa- 
tions into different cells is not needed. Of course, this method can be applied 
only if we know the theoretical distribution /(x). The problem how to estimate 
the mode is important for distributions where one of the constants may be in- 
terpreted as the mode or as a function of the mode'[l, 4]. 

4. Standard errors of the estimates. Th^ numerical work involved in the 
method (26) of estimating the grades is very small. To obtain the standard 
errors of these estimates we consider the asymptotic properties of the distribu- 
tion (9) . The following results hold therefore only for large numbers of observar 
tion. Bendes we assume, that the serial number m is of the order n/2, i.e. not 
extreme. It has been shown [2] that under these conditions the distribution 
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of the mth value converges toward a normal distribution with a standard error 
<r(x„), where 

(33) ff(xjVn = VF{x){ 1 - F(x)), 

Although this standard error does not contain m explicitly, it has a clear meaning 
for any value of x as we know from (26), which observation we have to attribute 
to the probability F(x). The classical proof about the approximate normality 
of the distribution of the median in large samples is a special case of this con- 
vergence and the classical standard error of the median, 

(34) 

is a special case of (33). The square root in (33) is maximum for F{x) — 
Therefore, 

(35) o-(x«)\/ n ^ . 

If the variate x may be reduced through the linear transformation (1) the 
standard error (r{z) of the reduced variate, called reduced standard error 

(36) (<r(z)-\/nl = ^ y/Giz)(l - G{z)), 

may be calculated as a function of z where z corresponds to Xm . To call at- 
tention to the fact that these numerical values do not depend upon n, they are 
written in brackets. The standard error of the estimate for Xm is, according to 
(2) and (3) 

(37) <r(Xm) = -^[(T{z)y/n]. 

vn 

Since the constant b is a measure of dispersion, the standard error of the estimate 
of the mth value is proportional to the standard deviation of the variate. The 
factors b and 1 /\/n show that the standard error of the mth value is of the same 
structure as the standard error of the arithmetic mean. 

For symmetrical distributions the standard error (33) of the mth value is also 
a symmetrical function. The standard errors of the estimate of the two quar- 
tiles, and generally of the estimates of two grades defined by F and 1 — F, are 
then identical. If the mode coincides with the median, the corresponding stand- 
ard error of the mth value is a minimum. For a symmetrical 17-shaped distribu- 
tion, however, where the density of probability passes through a minimum at 
the center of symmetry, the median has the largest standard error among the 
mth values. An example for such a distribution has been given by Leavens [9]. 
As the distribution of the mth value converges towards a normal distribution, 
it is legitimate to attribute to the mode of the mth value the standard error (33). 
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Therefore, for a large number of observations (33) gives the standard error of 
our estimate of the grades. The standard errors of the estimates (31) of the 
quartiles are 

(38) ^ ; .(«.) v; = . 

The arithmetic mean in its usual definition is not an mth value. Its standard 
error <r(x), where 

(39) (r{£)y/n = a, 

will, therefore, fall outside of the range of the standard errors of the mth values. 
(See graph 1.) If the distribution /(x) is such that the standard deviation 
does not exist, it is legitimate to estimate the arithmetic mean as a grade, and 
calculate it from the corresponding most probable mth value by introducing 
F{x)yf(x) and/'(x) into (26). The standard error of the arithmetic mean inter- 
preted as a grade is 

(40) = ^-1 VF(x)(1 - F{x)). 

If we use this estimate of the arithmetic mean for distributions where a exists, 
the usual determination of the mean will be moi*e (less) precise than its estimate 
as a grade if 

(41) o/(x) ^ VTO(1 - ni))- 
The standard error of the mode estimated as a grade is 

(42) ' <r(i)Vn = j~ y/Fm^ “ F{x)). 

As the standard error of any characteristic depends upon the way it is estimated 
from the observations, the standard errors of the mode or mean interpreted as 
grades differ from the usual standard errors. 


6. The most precise grade. Equation (33) may be used to define a new grade 
which has interesting properties. The standard error (33) of the estimate of the 
mth value is a function of F. We ask whether it possesses a minimum (maxi- 
mum). The corre.sponding value of the variate, f, may be called the moat 


{least) ■precise mth value or the most (least) precise grade. To obtain 
sufficient to calculate from (33) 


d«r(a:„) 

dF 


it is 


nd log «r* {xm) ^ 2n<r'(x„) 
dx <r(Xm) 

Therefore the most (least) precise grade is the solution of 


( 43 ) 


M - /(») 2f'(x) _ ^ 

F(x) 1 - F{x) m 
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This expression does not vanish if either F(x) = § orf'{x) = 0. It vanishes if 
both conditions hold simultaneously. For a symmetrical distribution passing 
through a mode (minimum), the mode (minimum), estimated as a grade, is the 
most (least) precise grade. Equation (43) may be written 

nx)r\x)Fix){l - Fix)) = i - Fix). 

If we introduce this expression into (16), we obtain D = 0 
and 


(44) 


Fii) = 


m — § 
n 


The most precise mth value is such that the adjusted frequency is the arithmetic mean 
of the frequencies m/n and (m — l)/n. 

The most precise mth value x cannot be calculated from the observations 
alone. It may be estimated in the same way as any grade by introducing the 
values Fix),f(x) and/'(x) into equation (26). 

To show the difference between the most precise grade and the mode we apply 
the procedure developed above to a skew distribution. The reduced distribu- 
tion of the largest value giy) and the probability G(y) are 

(45) giy) = e'^Giy); Giy) = 

The relation (1) between the reduced variate for which we write y instead of z 
and the largest value x is 


(46) 


= « + ^ 


a 


where u = x is the mode and 


(47) 


1 

a 




(T. 


The most probable serial number m(u) of the mode, obtained from (32) is 


(48) 


m(w) = 


n + e — I 
e 


This equation may be used for an estimate of the constant u. 

The reduced variance tr^iy) obtained from (36) and (45) is 

(49) (Ay)Vn) = - 1). 

A table for the reduced standard error a{y)\/n has been given in a previous 
publication [6]. The value <r(y)-s/n is plotted in figure 1 for probabilities G(y) 
from 0.01 to 0.96. The standard error has a minimum for a value of y located 
to the left of the mode y — 0. On the same graph are plotted the reduced 
standard errors for the normal distribution. As the normal reduced variate z 
differs from the reduced variate y, two different scales am used for the variates. 
The standard error of the estimate (48) of the mode interpreted as a grade, 
obtained by introducing y » 0 into (49) is 
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(490 «r(u)Vn «= - Ve"=n = 1.02205<r. 

a 

The most precise grade is 



a 



where y is the value of the reduced variate, for which the standard error (49) 
is minimum. We obtain from (49) and (45) the numerical values 

(50) y = -.46601; (?(y) = .20319; <r(^)Vn = .96887(r. 

The standard error of the most precise grade is 3 per cent smaller; the standard 
error of the mode, estimated as a grade, is 2 per cent larger than the standard 
error of the mean. 
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6. Confidence bands. The standard errors (33) of the grades may be used in 
a general way for the construction of confidence bands obtained from curves which 
control the fit between theory and observation. Consider first the observed 
stepfunction (m — Xm) and the theoretical ogive nF(x)^ x. The variate x is 
plotted along the abscissa, the cumulative frequency along the ordinate. Now, 
for large n any theoretical value x, which is not extreme, may be interpreted as 
an mth value having a normal distribution and a standard error (r(xm). At each 
point of the graph of nF{x)j x which is not extreme, we construct a segment of 
length 2(T(Xm) parallel to the x axis, the midpoint of the segment being on the 
theoretical ogive. In other words, we add the standard error a(x„,) to, and sub- 
tract it from, any corresponding value x, and attribute nF(x) to the beginning 
and end of these intervals. By this procedure we obtain two cur\'es nF{x), 
X ^ For each observation there exists a probability P = .68208 that it 

will be contained within the interval x T (r(xm). 

If we apply another hypothesis to the same observations, or choose other 
values for the constants, we reach, of course, other control curves. Of two com- 
peting hypotheses the one is to be preferred where the band contains a larger 
number of observations. , 

The same method may be applied to the eiiuiprobability test and to the com- 
parison of the observed and theoretical return periods [6]. This procedure is 
legitimate for all values which are not extreme. 

In the following, we construct the confidence bands for the normal distribution 

(51) giz) = e-*’ . 

The variate x is related to the reduced variate z by (1), which, in this case, be- 
comes 

(52) x = X + a\/2z. 

The probability Giz) is 

(53) G{z) = + m), 

where ^>( 2 ) stands for the Gaussian integral 

( 54 ) Mz) = -^[e-dt. 

Formulae (36) and (53) lead to the reduced standard error 

( 55 ) ffiz)Vn = Vl - 4 >*( 2 ), 

given in the table, col. 6. The standard errors a(xm) of the mth values obtained 
from (37) (52) and (55) are 

ff(x*) = K2)\/n]. 

Vn 


( 66 ) 
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As a numerical example we choose the annual precipitations observed in 51 
meteorological stations in Paris and its surroundings in the year 1938. We 
suppose that the differences between the 51 observations are only due to chance. 
The stepfunction m — Xm is plotted in figure 2. To obtain the theoretical 
ogive we compute the constants in (52). They are 

(57) X = 571.92; ay/2 = 38.52. 

The theoretical values x obtained from (52), the cumulative frequencies nF{x) 
obtained from the table of the Gaussian integral [11] and the standard errors 



(58) a{Xm) = 5.393 [a{zWnl 

obtained from (56) are given in the columns 2 to 5 and 7 of Table I. 

We trace in figure 2 the theoretical curve nF(x), x and the confidence band 
obtained from col. 7. by the methods described above. All observations are 
contained within the control curves. We may accept the theory that the differ- 
ences between the annual rainfalls observed in the 51 stations are only due to 
chance. 


7. Conclusions. To test a statistical hypothesis for a continuous variate we 
use the ogive, the equiprobability method, based on (1), and the return periods 
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(5). The three tests may be combined on appropriate probability paper. As 
the rank of the mth observation Xm may be rw or m — 1, vve have two series of 
observations. To obtain one and only one series we use for the ogive the serial 
number m — ^ provided that the number of observations is large. Generally, 
we attribute to Xm an adjusted frequency, namely, the probability (15) of the 
most probable mth value. The adjusted frequency is obtained from the serial 
number m — J and a correction, D, equation (17), which depends upon the dis- 
tribution. The correction is important for the three tests, and small n, further- 
more, for the equiprobability test and the return periods for the extreme observa- 
tions and any number n. 

The same correction D is used for estimating a grade through its relation (26) 
to the corresponding most probable serial number m. For distributions, where 
the second moment does not exist, we estimate the arithmetic mean from a 

TABLE I 


Normal Confidence Band and Theoretical Frequencies of the Rainfalls 


Reduced 

Variate 

Variate 

Frequency 

Reduced 
Standard Error 

Standard Error 
(*m) 

dt a 

X 

X 

51 F ix) 

• SI F (») 

0- (t) y/^ 


2 

3 

4 

5 

6 

7 

0 

571.91 

571.9 

25.50 

25.50 

.886 

4.8 

.2 

564.2 

579.6 

19.82 

31.18 

.899 

4.9 

.4 

556.5 

587.3 

14.58 

36.42 

.940 

5.1 

.0 

548.8 

595.0 

10.10 

40.90 

1.012 

5.5 

.8 

541.0 

602.7 

6.58 

44.42 

1.127 

6.1 

1.0 

533.4 

610.4 

4.01 

46.99 

1.297 

7.0 

1.2 

525.7 

618.1 

2.29 

48.71 



1.4 

418.0 

625.9 

1.22 

49.78 



l.G 

510.3 

633.6 

.60 

60.40 



1.8 

502.6 

641 .3 

.28 

50.72 




grade. For asymmetrical distributions we estimate the mode from a grade 
by (32) and (48). 

In this case, we have to introduce a distinction between the mode and the most 
precise grade (43). The adjusted frequency and the estimates for grades may 
be used even for small numbers of observations. 

The standard error of these estimates is obtained, equation (33) from the 
limiting, normal, form of the distribution of the mth value, which holds, provided 
the serial number is not extreme. To control a hypothesis we construct con- 
fidence bands, which are obtained from the standard errors of the grades. 
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FITTING GENERAL GRAM-CHARLIER SERIES 


By Paul A. Samuelson 
Massachusetts Institute of Technology 

1. Introduction. Since the last part of the nineteenth century at least, it has 
been common to represent a probability distribution by means of a linear sum 
of terms consisting of a pai*cnt function and its successive derivatives. Usually 
the parent function is the Type A or normal curve, as discussed by Gram [1], 
Bruns [2], Charlier [3], and numerous others. In addition there have been 
generalizations in various directions: for example, the Type B expansion in terms 
of the Poisson parent function and its successive finite differences. 

Unlike these two types, which have a definite probability interpretation, 
another generalization involves the use of other parent functions and their 
derivatives (or differences) to give an approximate representation of a given 
frequency curve. With this process is associfited the names of Charlier, Carver 
[4], Roa [5], and many others. Two general methods by which the equating of 
moments of the fitted curve and the given distribution yield the appropriate 
coefficients have been given by Charlier and Carver respectively. An account 
of the latter’s technique is more accessible to the average English speaking 
statistician. 

It is the purpose of the present discussion to indicate how the Charlier method 
may be simplified, and can be used to replace the Carver method. In doing 
so, I am following up the oral suggestion made some years ago by Professor 
E. B. Wilson of Harvard, that repeated integration by parts will yield the req- 
uisite coefficients very simply. At the same time certain methods implicit in 
the work of Dr. A. C. Aitken [6] show how the use of a moment generating 
function can often lighten the algebraic analysis. There will also be a brief 
indication of analogous results for general finite difference parent families; and 
attention will be called to a troublesome historical blunder which has per- 
meated the statistical literature. 

2. Alternative methods. Avoiding the overburdened expression generating 
function, I shall consider parent functions, called /(x), with the restrictive 
properties: 

a) Moments of all order of /(x) exist. 

b) Derivatives of any required order exist with appropriate continuity. 

c) There exist high order contact at the extremities of the distribution as 
defined below. 

Mathematically, 

a) / x^J{x)dx is finite for all positive integral values of k 

•t-oo 

and 

c) lim It’ fix) = 0 for all positive integral values of j and k. 

*-*±00 
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These conditions suffice for many statistically interesting cases, but where de- 
sirable they can be lightened. Thus, derivatives may only be defined ‘‘almost 
everywhere,” and there may be finite instead of infinite limits to the distribu- 
tion, etc. 

Given an arbitrary frequency curve F(x), we shall suppose it to be formally 
expanded in the series 

(1) F(x) ~ aof(x) + aif'(x) + (hf"(x) + • • • + a«/"(a:) + • • • , 


For convenience in what follows, we shaH assume that all distributions are given 
in terms of relative frequency so that the area under both / and F is equal to 
unity, so that Oo may be taken as unity. The suppressed absolute frequency 
can clearly be restored at any time by multiplication of both sides with the 
appropriate constant. Also for algebraic convenience, many writers consider 
the slightly modified form of the expansion 

F(x) ~ Ao/(x) - fl/'(x) + ^r(x) + • . • + . . . . 

It is assumed without discussion that the first n coefficients in such a series are 
to be determined by equating the first n moments of each side. 

I shall prove the two following identities: 


(2) (- l)"an = L.(F) - E Ln-,(/)(- lya , , 

n 

where 



The first of these which I owe to Prof. Wilson is implicit in Charlier's work. 
The second which may fairly be attributed to Aitken may reduce the actual 
work in many special cases met in practice. 

Both of these methods are closely related to the Charlier device of finding 
polynomials Sn(x) with the bi-orthogonal property 


f Sn(x)f(x) d* =» 0, 


t n. 


The subscript indicates the degree of the polynomial. By means- of n of the 
above relationships, the polynomials can be determined except for a factor of 
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proportionality. By formal integration of both sides of our expansion we have 
the Charlier identity 

On = / <S»(x)F(x) <ir/factor of proportionality, 

j— 00 

From a theoretical standpoint, this method leaves little to be desired; but in 
practice the algebraic work increases rapidly with the number of terms to be 
included in the series. 

In the Carver method, the new parent function in question, as well as the 
function to be approximated, are both expanded in terms of the normal curve, 
thus almost doubling the numerical calculations. After some differentiation, 
the members of the Type A family are eliminated yielding in the process the 
required coefficients in terms of the new parent family. We shall see below how 
this method may be related to the three above. 


3. Useful relationship. First, two simple identities may be presented: 

LAf) == 3 

== 0 , J < i. 

Given the above assumptions of high contact, this follows immediately from 
repeated integration by parts. 

Remembering that the reduced moments defined just above are the coefficients 
of the powers of a in the series expansion of the moment generating function 

il/(a;/‘) = f e^r{x) dx = Loif) + L,(f)a + UfW + • ' * 

we have the useful Aitken identity 

(4) MCa;/) = (-l)*M(a; /)«’■. 

This, too, is the immediate consequence of repeated integration by parts. 


4. Derivation of first method. Formally multiplying each side of (1) by 
ar"/n! and integrating, we have the formal identity 

L„(F) = aoBn(/) -OiLn-i(/) + • • • + (-l)"o»Lo(/) . 

This is a “triangular” system of linear equations in the unknown o's. It may 
be written in matrix terms 


Li(F) 


• 



U(f) 

Uf) 

UU) 


0 0 

LoCf) 0 

Uif) Uif) 



Oo" 


-ai 


02 


-as 
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The triangular matrix has the very special property that all of its elements are 
known as soon as the first column is given. For this leason, as we shall see, it 
is essentially equivalent to a simple sequence of numbers. This we shall call 
the sequence 'property. Because of this special form, the above systc'm by simple 
rearrangement may be written in the modified form 


rA,(F) 

Lt(F) 

0 ••• ■ 
Lo(F) • • • 


rA(/) 

L,(f) 

0 ••• - 
Uf) • • • 


do 

-dl 

d2 

0 0 ■ 
at, 0 • • 

— di do • * 


, 


, 

, 


, 

, 


By appropriate definition of symbolism, this may be written in the simple matrix 
form: 


L{F) = L(J) a(FJ), 

since multiplication of two triangular, “sequence’^ matrices is commutative. 

It is usually simplest to invert this triangular solution directly as in (2). 
But if necessary, we may express our answ^er in the equivalent form 

(6) a{FJ) = L{F)L{ST\ 

where the inverse to any special triangular matrix, also possesses the sequence 
property. 

If gf is a second parent function with the properties of Section 2, we have the 
relatioriship 

g) = a(^, /) «(/, g) 

which follows directly from (5). This may be generalized to 

a(Ji , h) a(ft ,fi) ••• a(/»-i , /») = a(/i . A) • 

If F itself is a parent function, we have 

a{F,f)a(f,F) = a{F, F) ^ I 


or 

a(/,F) = o(F,/r'. 

6. Relation to old methods. In terms of our notation, the Carver method 
seems to reduce to computing o(F, /) by the relationship 

a(F,/) = a(F',4>) 

where ^ is the Type A parent function. It involves a doubling of the work of 
coefficient determination. However, if only a few terms in the expansion are 
retained, this is of negligible importance. 
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The Charlier polynomials are clearly summed rows of the matrix product 





0 0 

x/ll 0 

0 a;V2! • • • 


To know the first n of these polynomials, it is not necessary to derive 
n{n + l)/2 different coefficients. Because of the sequence property, it is only 
necessary to derive n elements of the first column of L(/)”\ These can be 
expressed in terms of the reduced moments of /, as did Charlier; but the rela- 
tionships are non-linear and algebraically become tedious for high n. They are 
better computed from sequence relationships. 

The above discussion suggests that the bi-orthogonal relationship between a 
parent family and suitable polynomials has no deep significance. In particular, 
there is no essential relationship to least squares as in orthogonal expansions. 
It does, however, share one important property with orthogonal functions — 
determination of later coefficients does not affect the earlier ones. But this is a 
property of all triangular reductions, orthogonal expansions being only special 
cases of these. 

6. Sequence properties. Ordinarily to derive the inverse of an matrix, 
equations must be solved. For our triangular matrices, we need only solve n 
equations for one column. To each triangular matrix L{f) there corresponds a 
sequence \Lk(f)}, which is in fact the first column of the former. Similarly to 
L(/)~\ there corresponds {Z^it(/)1 J fhe elements of the latter are defined by the n 
equations 

UDUf) = 1 

+ Lx(f)Uf) = 0 


E = 0 

0 

But these are precisely the equations involved in the formal inversion of any 
linear operator system of the form 

00 

(6) E Ckh'‘y = 2 

0 

where h is an operator which commutes with a constant, and for which ft® = 1. 
c is a known function and y unknown. Thus ft may be such operators as 


X, d/dx, xd/dx, Ey A. 
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A particular solution of (6) is given by the formal expansion 

00 

y == 22 

0 

where the 2’s bear the same relationship to the c's as do the Vs to the Vs. 

Such “reciprocal” sequences appear in many branches of applied mathe- 
matics. In particular, they arise in the inversion of a power series. If formally, 

W(a) = ESta* 

0 

then 


1 

W{a) 


00 




Thus, to any triangular matrix with the sequence property, we can formally 
associate a function W{a) as well as a sequence of numbers. The calculus of 
multiplication of our 'triangular matrices clearly “corresponds” to the calculus 
of multiplication of functions, i.e. if the triangular matrices T\ y j • * • Tn and 
ITiCa), W^ 2 (a), • * * Wn{ot) correspond, and Tn = Tn-i ; then 


Wn{a) = Wx{a)Wt{a) • • • Wn^a). 
Also, l/Wiia) corresponds to 7T^ 


7. Moment generating functions. If only for the above reasons and no 
others, we should be tempted to consider the function formally defined by 

Z U(f)a\ 

0 

But this is precisely the expression for the familiar moment generating function, 
m. g. f. 

M(a;/) = re“y(x)dx = Z 

J-oe 0 

In this way, the method of triangular matrices joins the method used by Aitken 
for the Type A family. If 

F(xy ~ 2 Oifix), 

0 

and we formally equate moment generating functions of each side, we get 

(7) M(a; F) = Z 

0 

by means of the Aitken identity (4). Thus (— l)’a< equals the coefficient of a* 
in the formal expansion of 
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M{a;f) 


M(a; F)M(a;f)-\ 


Our relationship (2) follows immediately from (7) ; and by Taylor’s expansion in 
a of M(a;f)~', the identity (3) is quickly realized. 

For many problems, the reciprocal of the m, g. f . of fix) is itself a simple func- 
tion; to that our triangular equations may be inverted without solving linear 
equations. Thus where Fix) = fix + b), we immediately verify Taylor’s expan- 
sion by use of familiar properties of the m. g. f. under shift of origin. 


8. Finite difference expansions. Corresponding to integration by parts, we 
have the formula 

f: WiV‘Vi = (-1) i: A1F<V*-V< = (-1)* f: , etc., 

—00 —00 —00 

provided “high contact” properties are assumed. V and A are receding and 
advancing differences respectively. Recalling the familiar property of “reduced 
factorial” polynomials, , we have 

f: ^xv^fix) = (- D* S 3 ^ k 

—00 —00 

= 0 j < k, 

or 

Q,(V*/) = i-D^Qi-kif) 3 ^ k 

= 0 3 <k, 

where 


— «o J ! 

In the expansion 

Fix) ^ Oofix) + UiV/(x) -|- (hV^ix) 

the a’s obey laws identical to (2) and (3) where reduced factorial moments are 
substituted for the reduced L moments, and the f. m. g. f. 

i:/(x)(i -H «)*, 

—00 

for the ordinary m. g. f. 

9. Convergence. All of the above relationships are purely formal, without 
regard to convergence. The last is a difficult subject, and little discussed in the 
statistical literature, since applications of G-C series have been almost entirely 
concerned with empirical frequency curve fitting in which mathematical con- 
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vergence does not enter. Actually in the scanty treatments of the subject there 
has arisen a confusion between the Type A G-C expansion, which equates 
moments, and the expansion of a function in orthogonal Hermite functions. 
These are not unrelated, but nevertheless they are distinct. This is well recog- 
nized in the purely mathematical literature, but hardlv at all in the literature of 
statistics and physics. 

The series differ by an irremovable factor of 2. If the Type A functions are 
written as 

then the Hermite functions will take the form 

where the Wh are Hermite polynomials suitably normalized. Unfortunately 
the G-C series often diverges when the H series converges. Thus, the statistically 
interesting Cauchy distribution can be expanded in an H series; but since it 
possesses no finite higher moments, the G-C series cannot even be defined. 

It is not hard to show that the G-C expansion of F in terms of a Type A func- 
tion /(x), is equivalent to an H expansion of F/“* in terms of the H family /*. 
It is sufficient for convergence in the mean of the last expansion that F/”* be of 
integrable square or belong to L^. This means that the G-C type A expansion 
will be valid if F/“* is well behaved, not simply if F is well behaved. For F a 
histogram as is often the case in practise, no difficulties of convergence arise, 
although rapid convergence may be another matter. Nevertheless, many well 
behaved F's will not pa.ss the tnore strict test. The reader is referred to the last 
five titles in the bibliography for mathematical discussions of this problem. 

The above discussion holds only for the Type A expansion. There remains the 
very difficult problem of convergence conditions in the more general case. No 
immediate generalization suggests itself, except the application of the results of 
the “moment problem.^' However, this must be handled with delicacy, since 
the partial sums of the series may actually become negative over some range. 
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A METHOD OF TESTING THE HYPOTHESIS THAT TWO SAMPLES 
ARE FROM THE SAME POPULATION 

By Hauold C. Mathisen 
Princeton University 

1. Introduction. There are many cases in testing whether two samples are 
from the same population in which no assumption about the distribution func- 
tion of the population can be made except that it is continuous. A. Wald and 
J. Wolfowitz, [1], have developed a method of testing the hypothc'sis that two 
samples come from the same population based on certain kinds of runs of the 
elements from each sample in the combined ordered sample. W. J. Dixon, [2], 
has introduced a criterion for testing the same hypothesis based on the number 
of elements of the second sample falling between each successive pair of ordered 
values in the first sample. 

The problem considered here is that of devising a simple method of testing 
the hypothesis that two samples come from the same population, based on 
medians and quartiles, given only that the distribution function of the popula- 
tion is continuous. The simplest method may be described briefly as follows. 
We observe the number of elements, m\ , in the second sample whose values are 
lower than the median of the first sample. Since the distribution of mi is inde- 
pendent of the population distribution, we are able to compute significance 
points from the distribution of mi . These points may then be used for testing 
the hypothesis at a given significance level. This will be referred to as the case 
of two intervals. 

This method may be easily extended to the case of any number of intervals. 
In this note we shall consider the extension to four intervals by using the median 
and the two quartiles of the first sample to establish four intervals into which 
the elements of the second sample may fall. Then, if the second sample is of 
size 4m, it will be shown that, under the hypothesis that the two samples come 
from the same population, \ of the ^second sample, or m elements will be expected 
to fall in each interval. Let the number in the second sample which actually 
fall in each interval be mi , m 2 , ms , and mi respectively. The test function 
here proposed is, 

_ (Wi "■* + (m 2 - mf + (ms - mf + {mi - mf 

^ “ 9m2 

where 9m^ is a constant, which forces C to lie on the interval 0 to 1. If the mi , 
(i = 1, 2, 3, 4), have values quite different from their expected value m, it is 
apparent that C will be large. Therefore the greater the value of C the more 
doubtful is the hypothesis that the two samples come from the same population. 
Significance values of C will be computed for several sample sizes. The ques- 
tion of whether C is the ‘‘best” four-interval criterion for testing the hypothesis 
that two samples come from the same continuous distribution is an open one 
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which would depend for its answer on an extensive power function analysis. 
We shall not go into this analysis, however, but shall use C on intuitive grounds. 
This case will be referred to as the case of four intervals. The extension of the 
method of the case of four intervals to any number of intervals presents no new 
difficulties in derivation, however we shall confine our attention to the cases of 
two and four intervals. 


2. The case of two intervals. Suppose f{x) is a continuous distribution func- 
tion with probability element /(x) dx. Let us draw a sample of size 2n + 1 from 
a population having this probability element. Let the elements in the sample 
be Xi , X 2 , • • • , X 2 n 4 -i Ordered from least to greatest. The median of this sample 
w'ill be Xn-fi . Now consider a second sample of size 2m, and let mi be the num- 
ber of observations, whose values are less than Xn+i . We call m 2 = 2m — mi 
the number of elements in the second sample greater than Xn+i . 

fix) dx l)e the probability of an observation having a value less 

00 

than Xn+i . Then the probability of an element having a value greater than 
Xn-i-i is (1 — p). Thus we have the relation /(xn+i) dxn+i = dp. The probability 
law of the median, Xn+i given by the multinomial law^ is 

( 2 ) PrM = 

nl 1 in! 


The conditional probability law of mi , given Xn+i , is then 

From this it follows that the joint probability law of x„+i and mi is the product 
of (2) and (3) or 


(4) 


Pr(mi , Xn+i) 


(2n + l)!(2m) 
n!n!mi!(2m — mi)! 


dp. 


We may integrate (4) with respect to p fr^m 0 to 1 as a Beta Function, leaving 
the distribution function of mi independent of the population probability ele- 
ment fix) dx. We get for the distribution of mi , 


1 The multinomial law may be stated briefly as follows: 

If a trial results in one and only one of the mutually exclusive events Ei , Et , • • • t Ek t 
the probability P that in a total of n trials, ni will result in JFi , ni in • , n* in Et , 

(?-" 


^ , is given by 


P 


n! 


fii! fii! ••• n*! 




p2* 


where pi , p* , • • • , P* , 



El , Ek ,••• fEk respectively. 


are the probabilities of a single trial r^ulting in 
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p / . _ (2n + l)l(2fft)!(n + OTi)l (n + 2m — OTi)! 

n!n!mi!(2»i — mi)!(2n + 1 + 2m)! 


From (5) a simple recursion relation between Pr(mi) and Pr(mi + 1) may be 
determined from which the probabilities of various values of m may be rapidly 
computed. For large samples it can be shown that under certain regularity 
conditions, the ratio, [mi — F(mi)]/<r«, may be approximated by the normal distri- 
bution* with zero mean and unit variance. The derivation is similar to that of 
the four-interval case, which is taken up in greater detail. It will be found by 
the use of (4) that the expected value of mi is m, and the variance of mi is m -f- 
m(2m — l)(n -|- 2) 


2n + 3 


m . Using this information, values of mi for various 


TABLE I 

The Case of Two Intervals 

Lower and upper .01 and .05 percentage points for the distribution of mi 


Sample sizes Critical values of mi 


First 

2n+ 1 

Second 

2m 

Lower 

Upper 

. 


^*(.01) 

”‘(.01) 

11 

10 



9 


41 

40 

10 


28 


101 

100 

34 


62 

66 

;101 


72 

80 


128 

201 


77 

84 


123 

201 

400 

160 

181 


240 

401 

400 

167 

177 


233 

401 


353 

367 



1001 


448 

463 




significance levels may be computed. The .01 and .05 percentage points of mi 
for several sample sizes are given in Table I. The values for sample sizes of 10 
and 40 are computed directly from the probability law, while the larger samples 
have limits computed by the normal approximation. Thus for two samples of 
size 101 and 100, respectively, a value of mi less than 38 would be significant 
at the .06 level. Similarly, at the upper .05 level, the hypothesis would be 
rejected if a value of mi were obtained which was greater than 62. The necessity 
for the upper limits could easily be eliniinated by testing with respect to the 
smaller of mi and mt . However, for completeness, the upper percentage points 


* This statement may be proved by showing that as m, n — » w such that m/n ■■ constant, 
the limit of the moment generating function for the ratio is identical with the moment 
generating function of the normal distribution with zero mean and unit variance. 
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are included to show the range of values of wii in which the hypothesis that the 
two samples come from the same population may be accepted. 


3. The case of four intervals. If we let the first sample of size 4n + 3 be 
designated by (ii , , • • • a: 4 n+ 3 ), assumed drawn from a population with prob- 

ability element f(x) dx and ordered from least to greatest, then the range of x 
may be divided into four intervals by Xn+i , a:j„+ 2 , and zsn+s . The probability 
element of , Xin +2 > ^sn+s is 


(4n-H3)i 

n!n!n!n! 


a *11+1 \n/ fZiH+i \"/ + » \n/ \n 

fix) dxj ^ fix) dxj ^ fix) dxj fix) dxj 


•fiXn+l) dXn+lfiXin+2) dXi„+tfiXsn+t) dXsn+S • 


TABLE II 

The Case of Four Intervals 
.95 and .99 percentage points for the distribution of C 


Sample sizes 

c.« 

C.m 

First 

Second 

4n4-3 

4m 

n 

m 

15 

12 

3 

3 

.446 

.582 

63 

60 

15 

15 

.113 

.161 

103 

100 

25 

25 

.072 

.102 


Let 

/ fix)dx = pi, / fix)dx=pt, / fix)dx = pt, I fix)dx = pi. 

•'*11+1 •'*Jii+l *'*lis+l 

The probability element of Pi , P 2 , Pa , and pi is 

(6) P,(X4(»+.,) = ^i T riitnTllnl • 

Now let us consider the second sample, ix'i , X 2 , ■ • • xjm), of size 4m. I^et the 
number of observations falling in each of the preassigned intervals be m^ , (t = 
1, 2, 3, 4), where mi = 4m — mi — mj — m» . The conditional probability of 
the m< , given the values of x,'(n+i) is also determined by the multinomial law. 


( 7 ) 


Prim I X«»+l)) = 


(4m) 1 


milmtimalmi! 


pTpVp7*p7^. 


The joint distribution of the pi and the m^ is then 


^8) 


Fr(ir<(M-l) . ^i) 


(4n + 3)l(4m)l 
(n!)*mi.lmslm<!m4l 


^ n+mi ^ n+m 4 

Pi Pi Ps P4 


dpidptdpt. 
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To obtain the distribution of the m,- alone, the p,- will be integrated out by the 
Dirichlet Integral’ formula, giving a distribution which is clearly independent 
of the population distribution function f(x). 

/•q\ p ( A _ (4w + 3)!(4m)!(w + Wi)!(» + nh)l(n + m 3 )l(n + ^ 14 ) ! 

' ' (n!)‘‘mi!ms!nis!OT4!(4TO + 4n + 3)! 

To find the expected value of the m,- , the probability law of mi will first be 
derived. The probability function for the value of is 


( 10 ) 


Pr(x»+i) = 


(4n + 3)! 


l!n!(3n + 2)! 
Then we have the conditional probability 


pr(i - pi)’”-"’dp, 


( 11 ) 

and 

(12) Pr(x„+i , mi) = 


Pr(mi I X„+l) = 




mi! (4m — mi)! 
(4n + 3)1 (4m)! 


Pi) 


4m— mj 


n!(3n + 2)!mi!(4m — mi)! 


pr^*(i - pi)’"«-«’"-“‘ dpi 


To obtain the expected value of mi , the joint distribution of mi and pi is 
multiplied by mi , summed on mi from 0 to 4m , and integrated on pi from 0 to 1. 


(13) 


^ + 3)1 /*’ .^3„+2 


~ 4m 

• z 
_ 0 


mi 


(4m)! 


pra-pi)' 




J 


dpi . 


mi! (4m — mi) 1 

This interchange of the order of integration and summation is clearly valid. 
The quantity in brackets will be recognized as the first moment of the binomial 
distribution, (pi + where g = 1 — pi . Therefore we have 


(14) 


E{mi) ^ f 4mpi/(pi)dpi = 4m£(pi). 

JQ 


E{pi) and the higher moments of pi are found in the usual way by integrating 
the distributions as Beta Functions. From this we see that the expected value 
of mi is m. By repeating these operations on m 2 , mz , and , it can be seen 
that E{mi) = m, which also validates the statement made in the introduction. 


’ A discussion of the Dirichlet Integral may be found in Woods — Advanced Calculus^ p. 
167. It may be stated as follows for the problem in which we are interested 





X 


y — dx dy dz 


ra)r(m)r(n)r(r) 
rW -h m -h » 4- f) ' 


where we integrate over the region bounded byx4’V + 2**li and the three coordinate 
planes. 
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We have previously presented the criterion (1). 

The next problem is to find a distribution function to which the distribution 
of C may be fitted. A reasonable choice appears to be the Pearson Type I curve. 


(15) 


/(x) 


= r(»- + 8) ^r-; 

r(r)r(8) 


‘d - x) 


•-1 


The distribution of C is fitted by e(|uating the first two moments of the two dis- 
tributions and solving for the constants r and s of the Type I distribution. Using 
the theorem that the mean value of the sum of variates is equal to the sum of 
their mean values, we have 


(16) E{C) = — - {E(m?) + E{m\) + E{m\) + E{m\) - Am\ 


Also the second moment may be written as 

Ei^) = [Eim\) + E{ml) + E{m\) + E{m\) + 16m" + 2E(mlml) 

olm* 

(17) + 2Eim\ml) + 2E(m\m\) + 2E{m\m\) + 2E{m\ ml) 

+ 2E{m\m\) - 8m" {E(m\) + E(ml) + £(ml) + £:(ml)j]. 

The expected value of m* is found in the same manner as E{mi) and here also it 
can be shown that the £(ml) are all equal. The same procedure holds for 
f?(ml). 


£(m<) = m + 


m(4m — l)(n + 2) 
4n~+“5 


/ION _ j_7m(4m- l)(n + 2) j 6m(4m- l)(4m- 2)(n4-3)(n + 2) 

(18) Eim.) = m+ 4- (4n + 6)(4nT^ 

. m(4m — l)(4m — 2)(4m — 3)(w + 4)(n + 3)(w + 2) 
~ (4n + 7)(4n + 6)(4n + ^ ‘ 


By using the moment generating function of the trinomial distribution, the 
£(mlm*) may also be found in a similar manner. 


(19) 


E{mWi) = 


m(4m — l)(w + 1) , 2m(4m - l)(4m — 2)(n + l)(n + 2) 
4n + 5 (4n + 6)(4n + 5) 


. m(4m — l)(4m — 2)(4m — 3)(n + 2)(n + l)(n + 2) 
(4n + 7)(4n + 6)(4n + 5) 


As a result we have 


EiC) = 


9m 


+ 


4(4m — l)(n + 2) 
9m(4n + 5) 


( 20 ) 
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Let i5(C) = i4 to simplify later relations to be computed. Finally 

E(r^) ^ 4 Ti . 7(4m - l)(n + 2} 6(4m - l)(4m - 2)(n + 3)(n + 2) 

^ ^ Slm^ L 4n + 5 ^ (4n + 6)(4n + 5) 

(4m - l)(4m - 2)(4m - 3)(n + 4)(n + 3)(n + 2) , 

(4n + 7)(4n + 6) (4n + 5) 

, 3(4m — l)(n + 1) , 6(4m — l)(4m — 2)(n + l)(n + 2) 
^ 4n ,+ 5 (4n + 6)(4n + 5) 

3(4m - l)(4m - 2)(4m - 3)(n + 2)'(n +1) p 2 
^ (4n + 7)(4n + 6)(4n + 5) 

_ 8m^(4m — l)(n + 2)1 
4n + 5 J ' 


To simplify later relations we let E{(f) = B, 

The first two moments of the Type I distribution are easily found to be 


( 22 ) 


Ml 


r 

r + s 


= A 


M2 = 


Mi(r + 1) ^ ^ 

(r + s + 1) 


Solving these two simultaneous equations for r and s, 


(23) 




r. 


A number of percentage points for the Type I distribution have been computed 
by Miss Catherine Thompson, [3]. Using these limits, the hypothesis may be 
accepted or rejected as to whether or not the two samples come from the same 
population. 

Table II shows the .95 and .99 percentage points of C for three sample sizes. 


4. Summary. The problem considered here is that of devising a simple 
method of testing the h^^pothesis that two samples are from identical populations 
having continuous distribution functions. It may be summarized briefly as 
follows. The first sample is used to establish any desired number of intervals 
into which the observations of the second sample may fall. A test criterion is 
proposed which is based on the deviations of the numbers of elements of the 
second sample which fall in the intervals from the expected values of the respec- 
tive numbers. Two cases are discussed, that of two intervals and that of four 
intervals, making use of the median and quaitiles in the first sample to deter- 
mine the intervals. Tables of 1% and 5% points for several sample sizes of 
both cases are given. 
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NOTES 

This section is devoted to brief research and expository articles, notes on method- 
ology and other short items. 

NOTE ON THE INDEPENDENCE OF CERTAIN QUADRATIC FORMS 

By Allen T. Craig 
University of Iowa 

Various approaches to the problem of the independence of quadratic forms 
in normally and independently distributed variables have been made by R. A. 
Fisher, Cochran, Madow and others. It is the purpose of this note to point 
out a few simple propositions which, in so far as the writer is aware, have not 
had specific mention in the literature. 

1. Independence of certain quadratic forms. Theorem 1: A necessary and 
sufficient condition that two real symmetric quadratic forms, in n normally and 
independently distributed variables, be independent in the probability sense is that 
the product of the matrices of the forms be zero. 

Let the chance variable x be normally distributed with mean zero and unit 
variance. Let Xi , X 2 , • • • , x,. be n independent values of x and let A and B 
l3e two real symmetric matrices, each of order n. Write Qi == ZZaijXiXj and 
Q 2 = XXbijX^Xj where || a.y || = A and || h.i 1| = B. It is well known that the 
generating function of the moments of the joint distribution of Qi and Q 2 can be 
written 

G{\, y) = \i - \A - yB\-\ 

SO that 

(1) I / - XA - X'B I = I / ~ XA II / - X'i? I , 

for all real values of X and X', is necessary and sufficient for the independence of 
Qi and Q 2 . 

If Qi and Q 2 are independent, then (1), being true for all real values of X and 
X', is in particular true for X == X'. Thus 

(2) I / - X(A + 5) I s I / - XA II / - Xfi I . 

Denote by n , r 2 and r < ri + r 2 respectively the ranks of A, B and A + B. 
Then r = n + r 2 since (2) expresses the identity of two polynomials in X of 
degrees r and n + ^ 2 . 

Further, if we write 

I / — XA I = (1 — Xpi) • • • (1 — Xpn), 

I / — XB I « (1 — \qi) •••(! — Mrt)f 
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and 1 1 — X(A + J5) I = (1 — Xsi) •••(! — Xs^i+r,), then, because the factoriza- 
tion of polynomials is unique, each Sj can be paired with one of the numbers 
Pi > * • * > Pri , , * * • , . Thus, if Qi and Q 2 are independent, the rank of 

A + B is the sum of the ranks of A and B, and the non-zero roots of the char- 
acteristic equation of A + B are those of the characteristic equation of A 
together with those of the characteristic equation of B. There exists an appro- 
priately chosen orthogonal matrix L of order n such that L'(A + jB)L, L' being 
the conjugate of L, is a matrix with the reciprocals of the numbers Pi , • • • , Prj , 
Qi 7 y Qrt on the principal diagonal and zeros elsewhere. Then UAL and 
UBL have no overlapping non-zero elements and UALUBL = 0. But U = 
the inverse of L, Hence, upon multiplying both members of the preceding 
equation on the right by U and on the left by L, we have AB = 0 . Since 
A == A' and B = S', likewise BA = 0. 

Conversely, suppose AB = 0. Then the matrix (I — \A){I — X'B) = 
/ — X*4 — \'B. These matrices being equal, their determinants are ecjual and 
the condition (1) for the independence of Qi and Q 2 is satisfied. 

The theorem is readily extended to the case of the mutual independence of 
any finite number of such quadratic forms. 

The product of a non-singular matrix and a matrix of rank /? is a matrix of 
rank jB. Hence, every non-singular quadratic form of the kind here discussed 
is correlated with every non-identically vanishing (juadratic form in the same 
variables. 

2. Conditions for independent Chi-Square distributions. The preceding 
theorem enables one to determine, by multiplication of matrices, whether real 
symmetric quadratic form.s in normally and independently distributed variables 
are themselves independent in the probability sense. The following theorem 
affords a simple test as to whether the distributions are of the Chi-Square type. 

Theorem 2: Necessary and sufficient conditions that each of two real symmetric 
quadratic formSy in n normally and independently distributed variables with mean 
zero and unit variance, be independently distributed as is Chi-Square, are that 
the product of the matrices of the forms be zero and that each matrix equal its own 
square. 

If Q\ and Q 2 are independently distributed as is Chi-Square, then AB = 0 
and each of the non-zero roots of the characteristic equations of A and B is +1. 
For an appropriately chosen orthogonal matrix L, of order n, UAL is a matrix 
with ri elements on the principal diagonal +1, all gther elements being zero. 
For such a matrix it is seen that {UAL){UAL) = UA^L = UAL and A* = A. 
A similar argument shows that B^ = B. 

Conversely, if AB = 0, then Qi and Q 2 are independent. Further, if A^ = A 
and B* = B, each of the non-zero roots of the characteristic equations of A and 
B is +1. This follows from the fact that the roots of the characteristic equa- 
tion of the sejuare of any matrix are themselves the squares of the roots of the 
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characteristic equation of that matrix. Since A and B are real and symmetric, 
the roots under consideration are real. Thus Qi and Q 2 have independent 
Chi-S(iuare distributions with ri and n degrees of freedom respectively. 

This theorem can likewise be extended to any finite number of these (piadratic 
forms. 

Of special interest is the case of, say k, quadratic forms for which the sum of 
the k matrices is the identity matrix. Thus Ai + A 2 + • * • + Ak — I. By 
Theorem 1, it is both necessary and sufficient for the mutual independence of the 
k forms that AuA^ = 0, u 9 ^ v. 

Now 

/I, = / — Ai — • • • — At_i “ Ai^i — • • • — — • • • — Ak 

and 

AtAj = A j - Auij — • • • — Ai^iA j — - 4 . 4 . 1 . 4 ; — . . . — ■ — . . . - AkAj , 

so that Aj = A^j . In this particular case it is to be seen that the mutual inde- 
pendence of the forms implies that their several distributions are of the Chi- 
Square type. 


A CHARACTERIZATION OF THE NORMAL DISTRIBUTION 

By Irving Kaplansky 
Harvard University 

In 1925 R. A. Fisher gave a geometric derivation of the joint distribution of 
mean and variance in .samples from a normal population {Metron, Vol. 5, pp. 
90-104). On examining the argument however, we find that an (apparently) 
more general result is actually established: if /(xi) • • • /(x„) is a function g{m, s) 
of the sample mean m and standard deviation s, then the probability density of 
m and s in samples of n from the population /(x) is g{m, This condition 

on/(x) is of course satisfied if /(x) is normal; in this note we shall conversely show 
that for n ^ 3 it characterizes the normal distribution. In the proof it will be 
assumed that g{m, s) possesses partial derivatives of the first order, although a 
weaker as.sumption would probably suffice. 

Let us for the moment restrict the variables x» to values such that/(x,) > 0. 
After a change of notation we have 

+ ••• + <l>{Xn) = h{Uy v), 

where = log/, u = Xi + • • • + XnjV = i(xj + • • • + Xn). A differentiation 
yields 


4 ^ (Xi) ““ ft-tt “I* hfiXi 
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Solving two of these equations for h, , we find 


( 1 ) 


h, = 


<t>'(Xi) - (f>'(x,) 
Xi — Xj 


(* ^ j), 


and, for n ^ 3, it follows that the right member of (1) is a constant, say 2A. 
Then 


<l>'ixi) — 2Axi = ^'(x,) — 2Axj = a constant B. 

^(x) = Ax* -f- Bx -|- C. 

We now have/(x) = whenever /(x) > 0; but since /(x) is continuous, this 
implies /(x) = everywhere. 



NEWS AND NOTICES 

Readers are invited to evbmit to the Secretary of the Institute news items of general interest 

Personal Items 

Dr. Holbrook Working has been appointed Chief Statistical Consultant on 
Industrial Processes and Products in the Office of Production Research and 
Development of the War Production Board. 

Professor Harold Hotelling of Columbia University was the official representa- 
tive of the Institute of Mathematical Statistics at the Copernican Quadri- 
centennial Celebration which was held in New York City on May 24. 

Dr. Edward B. Olds has taken a position with the Curtiss- Wright Corporation. 
Dr. Nilan Norris is a Sergeant with the Fourth Statistical Control Unit of the 
Fourth Air Force with headquarters at San Francisco, California. 

Dr. Edward Helly is with the Signal Corps Training Program at Illinois Insti- 
tute of Technology. 

Dr. C. W. Cotterman is in the United States Army at Camp Grant, Illinois. 
Mr. M. D. Bingham has \yeen commissioned an Ensign in the United States 
Naval Reserve and is stationed at Fort Schuyler, New York. 

Lt. George W. Petrie, USNR, is teaching in the Midshipmen’s School at 
Notre Dame, Ind. 


New Members 

The following persons have been elected to membership in the Institute : 

Arias B., Jorge Civ. Eng. (Guatemala) Eng., Rural Electrification Administration, 420 
Locust St., St. Louis, Mo. 

Bailey, A. L. B.S. (Michigan) Stat., American Mutual Alliance, 60 East 42 St., New York, 
N. Y. 

Becker, Harold W. Instr., Mare Island Trainee School. 126 Benson Ave., Vallejo^ Calif. 

Bernstein, Shirley R. B.S. (Carnegie Inst. Tech.) Res. Asst., United Steelworkers of 
America, Pittsbuigh, Pa. 6601 Beverly PI. 

Bickerstaff, Asst. Prof. Thomas A. M.A. (Mississippi) Univ. of Miss., University, Miss. 

Birnbaum, Asst. Prof. Z. William. Ph.D. (Lwow) Univ. of Wash., Seattle, Wash. 

Brumbaugh, Prof. Martin A. Ph.D. (Pennsylvania) Univ. of Buffalo, Buffalo, N. Y. 

Burrows, Glenn L. B.A. (Michigan State Coll.) Instr., Michigan State Coll., East Lans- 
ing, Mich. 

Cohen, Jozef B. B.S. (Chicago) Sage Fellow in Psychology, Cornell Univ., Ithaca, N. Y. 

Cope, Asso. Prof. T. Freeman. Ph.D. (Chicago) Queens College, Flushing, N. Y. 

Cudmore, Sedley A. M.A. (Oxford) Stat., Dominion Bur. of Stat., Ottawa, Canada. 

Cureton, Edward E. Ph.D. (Columbia) Sr. Personnel Technician, War Dept., RFD I, 
Tauxemontj Alexandria ^ Va. 

De Castro, Prof. Lauro S. V. Civ. Eng. (E^cola Nacional de Enginharia) Catholic Univ., 
Rio de Janeiro, Brazil. 62 rua David Campista. 

Edwards, G. D. A.B. (Harvard) Dir. of Quality Assurance, Bell Telephone Laboratories, 
463 West St., New York, N. Y. 

Gilford, Kenneth R. Student, Mass. Inst. Tech., Cambridge, Mass. 97 Bay State Rd., 
Boston^ Mass. 
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Gottfried, Bert A. A.M. (Ck)lumbia) Stat. Clerk, 4300 Kaywood Dr., Mt. Ranier, Md. 

Hamilton, Prof. Thomas R. Ph.D. (Columbia) Texas A. & M. Coll., College Station, Tex. 

Heide, J. D. M.S. (Iowa) Stat., U. S. Rubber Co., ISU Altoona Ave., Eau Claire, Wise, 

Hilfer, Irma. M.A. (Columbia) Actuary, N. Y. C. Board of Transportation, 165 W. 97 St., 
New York, N. Y. 

Howell, John M. B.A. (UCLA) Stat., Northrop Aircraft Inc., Hawthorne, Calif. 4^40 
W. 68 St., Lob Angeles, Calif. 

Hurwicz, Leonid. L.L.M. (Warsaw) Res. Asso., Cowles Comm., Univ. of Chicago, Chi- 
cago, 111. 

Kendall, Maurice G. M.A. (Cambridge) Stat., Chamber of Shipping of the United King- 
dom, Richmond House, Aldenham Rd., Bushey, Eng. 

Klein, Lawrence R. B.A. (California) Teaching Fellow, Mass. Inst. Tech., Cambridge, 
Mass. 

Kuznets, George M. Ph.D. (California) Instr., Giannini Foundation, Univ. of Calif., 
Berkeley, Calif. 

Landau, H. G. M.S. (Carnegie Inst. Tech.) Stat. Analyst, War Dept., Washington, D. C. 
240s 20 St., N.E. 

Langmuir, Charles R. Ed.M. (Harvard) Carnegie Foundation. 4^7 West 59 St., New 
York, N. Y. 

Levy, Henry C. L.L.B. (Fordham) Instr., N. Y. C. C., New York, N. Y. 600 West 116 St. 

Li, Jerome C. R. B.S. (Nanking) Student, Iowa State Coll., Ames, Iowa. 2184 Lincoln 
Way. 

Lieberman, Jacob E. B.S. (Brooklyn Coll.) Jr. Stat., Census Bureau, Washington, D. C. 

2422 14 St., N. E. 

Martin, Margaret P. M.A. (Minnesota) Instr., Columbia Univ., New York, N. Y. 1280 
Amsterdam Ave. 

Nash, Stanley W, B.A. (Coll, of Puget Sound) San Joaquin Experimental Range, O^Neals, 
Calif. 

Norton, Horace W. Ph.D. (London) Sr. Meteorologist, U. S. Weather Bur., Washington, 
D. C. 8118 North First Rd., Arlington, Va. 

Olds, Edward B. Ph.D. (Pittsburgh) Stat., Curtiss-Wright Corp. 298 Niagra Falls 
Blvd., Buffalo, N. Y. 

Preston, Bernard. C.P.A., 108 Park Ave., New York, N. Y. 

Rosenblatt, David. B.S. (Coll. City of N. Y.) Asst. Stat., 1422 Whittier St., N. W., Wash- 
ington, D. C. 

Sard, Asst. Prof. Arthur. Ph.D. (Harvard) Queens College, Flushing, N. Y. 146-19 
Beech Ave. 

Schapiro, Anne. B.A. (Bryn Mawr) Jr. Analyst, Institute of Applied Econometrics, 
850 W. 57 St., New York, N. Y. 

Simpson, William B. Grad. Student, Columbia Univ., New York, N. Y. 

Springer, Melvin D. M.S. (Illinois) Asst. Instr., Univ. of Illinois, Urbana, III. 

Stein, Irving. B.S. (Mass Inst. Tech.) Asso. Stat., War Dept., Washington, D. C. 611 
Oglethorpe St. 

Stergion, Andrew P. M.S. (Mass Inst. Tech.) 1st Lt., USA, The Proving Center, Aber- 
deen Proving Gd., Md. 

Sternhell, Arthur I. B.A. (New York) Staff Asst., Metropolitan Life Ins. Co., 1988 E. 
Tremont Ave., Parkchester, N. Y. 

Thompson, Louis T. E. Ph.D. (Clark) Dir. Res. and Dev., Lukas-Harold Corp., In- 
dianapolis, Ind. 840 East Maple Rd. 

Tyler, Asst. Prof. George W. M.A. (Duke) Virginia Polytechnic Inst., Blacksburg, Va. 

Working, Holbrook S. Ph.D. (Wisconsin) Chief Stat. Consultant, War Production 
Board, Washington, D. C. Food Res. Inst., Stanford Univ., Calif. 
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The following persons have been elected to Junior membership in the Institute : 
Blumenthal, Lydia. Hunter College, New York, N. Y. 1001 Lincoln Pl,^ Brooklyn^ N. Y, 
Gunlogson, Lee. Univ. of Minnesota, Minneapolis, Minn. 1906 Third Ave. 

Heacock, Richard R. Oregon State Coll., Corvallis, Ore. P. 0. Box 207 ^ Seaside ^ Ore, 
Locatelli, Humbert J. Columbia Univ., New York, N. Y. 44 Seaman Ave. 

Mathisen, Harold C., Jr. Princeton Univ., Princeton, N. J. 4 Middle Dod Hall. 

Murphy, Ray Bradford. Princeton Univ., Princeton, N. J. 28 Godfrey Rd.f Upper Mont- 
dairy N. J. 

Peters, Edward J., Jr. Georgetown Univ., Washington, D. C. 126 St. James Pl.j Atlantic 
Cityy N. J. 

Smith, Joan T. Univ. of Minnesota, St. Paul, Minn. 673 East Nebraska Ave. 



SPECIAL COURSES IN STATISTICAL QUALITY CONTROL 

The application of statistics to quality coatrol is now being furthered in a 
program in which the War Production Board and the U. S. Office of Education 
are cooperating to assist statisticians in various industrial areas to provide 
suitable courses of instruction sponsored by their own institutions. 

The general plan of the program has been influenced by two conclusions 
drawn from the experience gained in ESMWT courses carried on by Stanford 
University during 1942-43.^ These conclusions were: (1) that a short full- 
time course in statistical quality control tends to be peculiarly effective; and 
(2) that it is vital to have the initial courses followed by meetings in which the 
course members gather to report on applications the 3 ^ have made and to receive 
encouragement and any needed assistance. 

The giving of short full-time courses presents a problem of assembling a suitable 
staff, since four instructors will ordinarily be needed. If this problem were solved 
by arranging for a single staff to tour all the principal industrial regions giving 
courses in quality control, the local leadership necessary for establishing wide- 
spread use of statistical methods of quality control in industry would not be 
developed. The program adopted seems to offer an effective solution of these 
problems. 

Under the program now in effect, the War Production Board, through its 
Office of Production Research and Development supplies an experienced person 
to assist with the arrangement of courses and to participate in the instruction.* 
Two of the instructors in each course will ordinarily be provided by a local educa- 
tional institution, which will also promote the course and make necessary local 
arrangements through its institutional representative of the Engineering Science 
and Management War Training program. It is not considered necessary that 
the instructors provided by the institution have previous experience with statisti- 
cal quality control provided they are sufficiently competent in the theory of 
sampling, but it is desirable that at least one of them have practical experience 
with quality control. It may often happen that one of the instmctors can be a 
quality control man from a local industrial establishment. The representative 
of the WPB will assist with arrangements for bringing in one (or, where needed, 
two) additional outside instructors. 

The sponsoring institution costs for the courses* which do not include the 
salary and expenses of the representative of the WPB, may be provided through 
the ESMWT program. The follow-up work with men who have taken the 
initial courses may be arranged also as part of the ESMWT program of the 

^ A defcription of these courses offered by Stanford University appeared in the Anito^ 
of Mathematical Staiieiice^ March 1943, p. 96. 

* At present Professor Holbrook Working is serving in this^capacity. 
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educational institution sponsoring the original course. The follow-up work 
should be handled by a local instructor who participated in the original course. 

The two basic courses and the one follow-up course that have already been 
given by Stanford University were conducted under essentially the plan out- 
lined above, except that they did not have the benefit of assistance from the 
WPB. Three courses have thus far (May 25) been arranged under the new 
plan: one sponsored by Rhode Island State College, to be held during May 27 
to June 2 at Newport, and two sponsored by Stanford University, to be held 
respectively in Los Angeles, June 13 to 20, and in San Francisco, June 22 to 29. 
Preliminary steps have been taken toward the arrangement of several additional 
courses. 



REPORT OF THE NEW YORK MEETING OF THE mSTITUTE 

A joint meeting between the Institute and the American Society of Mechanical 
Engineers was held on Saturday, May 29, 1943 at the Engineering Societies 
Building, 29 West 39th Street, New York City. Of the ninety-five individuals 
attending the meeting, the following fifty-seven members of the Institute were 
present; 

Theodore W. Anderson, K. J. Arnold, Robert E. Bechhofer, B. M. Bennett, C. I. Bliss, 
Mary E. Boozer, P. Boschan, A. H. Bowker, Burton H. Camp, A. C. Cohen Jr., H. F. Dodge, 

C. Eisenhart, Mary L. Elveback, W. C. Flaherty, H. Goode, John I. Griffin, Charles C. 
Grove, Frank E. Grubbs, E. J. Guirbel, Harold Hotelling, J. M. Juran, B. F. Kimball, 
Lila Knudsen, Howard Levene, E. Vernon Lewis, Simon Lopata, Frank W. Lynch, 
Henry Mann, E. C. Molina, N. Morrison, Philip J. McCarthy, Luis F. Nanni, 
Franklin S. Nelson, M. L. Norden, P. S. Olmstead, R. F. Passano, Edward Paulson, G. A. 

D. Preinreich, A. C. Rosander, Arthur Sard, Henry Schef!6, Bernice Scherl, Edward M. 
Schrock, L. W. Shaw, William B. Simpson, S. G. Small, Arthur Stein, Andrew P. Stergion, 
M. Stevens, David F. Votaw Jr., A. Wald, Helen M. Walker, W. A. Wallis, S. S. Wilks, J. 
Wolfowitz, L. C. Young. 

The general topic of the meeting was Industrial Applications of Statistics. At 
the morning session the following papers were presented, with Professor Harold 
Hotelling presiding: 

1. On the Theory of Runs with some Application to Quality Control, 

J. Wolfowitz. 

2. On the Presentation of Data as Evidence, 

Churchill Eisenhart. 

At the afternoon session, the following papers were presented with Mr. E. C. 
Molina as Chairman: 

1. A Sampling InipecHon Plan for Continuouo Production. 

H. F. Dodge. 

2. Tolerance* and Product Acceptability. 

L. C. Young. 

A meeting of the Board of Directors was held after the afternoon session. 

Edwik G. Olds 

Secretary 
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THE COMPARISON OF DIFFERENT SCALES OF MEASUREMENT FOR 
EXPERIMENTAL RESULTS^^ 

By W. G. Cochran 
Iowa Slate College 

1. Introduction. In some fields of research, the development of a satisfactory 
method for measuring the effects of experimental treatments constitutes a diffi- 
cult problem. The estimation of the vitamin content of preparations of foods 
f urn shes a good example ; for most of the vitamins several years of work were 
lequired to construct a reliable method of assay. In other cases, where the ideal 
method for measuring treatment responses is costly or troublesome, a search 
may be made for a more convenient substitute. Thus in pasture or forag(^-crop 
experiments the spt'cies composition of a plot may Ijc estimated by oyo inspection 
as a substitute for a complete botanical separation. As a third example we may 
quote experiments in cookery, where the flavor and cpiality of the dishes are 
subje(*t to the whims of human taste. Frequently a panel of judges is employed, 
each of wliom scores the dishes independently. It is not easy to determine how 
the panel should be chosen, nor how representatixe its verdicts are of consumer 
preferences in general. 

When such problems are. investigated, experiments may be carried out sj^e- 
cifically for the purpose of comparing two or more methods or Hcale^ of measure- 
ment. Where the process of measurement affects only the final stages of the 
(‘xperiment, as in the last txvo examples quoted above, all that is necessary is to 
score the same experinnmt by the various scales under consideration. In com- 
paring two different methods of assaying vitamins, on the other hand, inde- 
pendent experiments are frequently required, the only common feature being 
that the same set of treatments is tested in both experiments. 

In the interpretation of the results of such experiments, two types of compari- 
son are of general interest. One concerns the relations between the scales. It 
may be summed up rather loosely in the (question: Are the effects of the treat- 
ments the same in all scales? For a more exact formulation, consider the case 
of two scales, which is probably the most frequent in practice. Let fu , 
be the true means of the ith treatment as measured on the two scales. We may 
wish to examine the following hypotheses: 

(i) Scales equivalent: 

( 1 ) = (21 , (all t); 

(ii) Scales equivalent, apart from a constant difference: 

( 2 ) = $2t + €, (all t ) ; 

^ Paper presented at a meeting of tlic Institute of Mathematical Statistics, Washington, 
D. C., June 18, 1943. 

* Journal Paper No. J-1136 of the Iowa Agricultural Experiment Station, Ames, Iowa. 
Project 514. 
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(iii) Scales linearly related: 

(3) a(ii + = y, (all 0 ; 

(iv) Relation monotonic, but not linear: 

( 4 ) (- 11 0 ; 

p 

where the function is strictly monotonic. ; 

In this case the two scalej^fre mutually consistent in that they place any set 
of treatments in the same orll^. The ratio of a treatment difference in one s^ cale 
to the corresponding ‘difference" in the other scale is, howeyer, not constant. 

(v) Relation not monotonic: Here the scales do not the treatments ir the 
same order and consequently are not satisfactory sub‘ atutes for each other. 

The second question concerns the relative accurac sensitivity of the two 
scales. For practical purposes this question may be pt as follows: how many 
replications are required with tht second scale to'^ati i t^p accuracy given b\' r 
replications with the first scale? It is clear that the ; ^wer depends both « n the 
experimental errors associated v)5th the scales and,^ ' the magnitudes o^ " e 
treatment effects in the two scales. For example, Cc ward [1] reports th t \ 
the assay of vitamin D, male rats '-five a higher experimental error than females, 
yet provide a more accurate assa v ^ 3 ecause they are more responsivk The rela- 
tive accuracy may be different in &erent parts of the two scales. This is likely 
to happen whenever the relation between the scales is of type (iv) above. 

This paper gives a preliminary discussion of some of the simpler questions 
raised above, to which recent work in multivariate analysis is applicable. A 
complete solution for small sample work appears to demand considerable further 
development in the distribution theory of multivariate analysis. 

The discussion is confined to the case in which ^all scales measure the same 
experiment. The case where each scale requires a separate experiment may be 
expected to be somewhat simpler, but cannot conveniently be treated as a special 
case of the procedure for a single experiment. 

2. Assximptions. Let xi , X 2 , • • ^ Xp ^ynote measurements on the p scales 
and let ni and n 2 be the numbers of degr^ of j^eedom for treatments and error 
respectively. The experimental data fug^$h ^^oint analysis of variance and 
covariance of the p variates as follows: c 



i 

Sum of squares 


‘V- 

or products 

Mean 

1 

mu 

Treatments 

ni 

0.-J 

Error 

rh 

hii 


It will be assumed that xi, • • • yXp follow a multivariate normal distribution, 
and that for an}’' pair of variates Xi , Xj the error mean covariance o-,*, is constant 
throughout the experiment (though it may vary as i and j vary). Thus the 
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quantities bn follow the standard joint distribution, Wishart [16], of sums of 
squares and products while the quantities m,*, and a*, follow the corresponding 
non-central distributions and the three sets of distributions are independent. 

Tests f r equivalence. If there are only two scales, a test for equivalence 
is ootained from elementary tec viiques. An ar lysis of variance similar to (5) 
is computed on the differences between the two Jes for every observation. If 
equations (1) hold in the population, the sums squafps for the Mean, Treat- 
ments and Error arc distributed independent as + <^22 — 20 - 12 ). The 

pooled mean square f the Mean and Treati 3nts may therefore l^e compared 
with the Error mean si fSlre in a variance-ratic est, the degrees of freedom being 
(ni + 1) and n 2 . If th '^scales are equivaleis. apart from a constant difference, 
the same result is valid T^^r^Treacments and Error, while the mean square for the 
Mean is proportion^j^* non-central Thus separate z- or F-tests on the 
and Treatments assist in distinguishing between hypotheses fl) and (2). 
More than two scales. Let be the true mean of the ^th treatment as 
,e?ifeured on the fth scale. The first tvo 1 vpotheses may now be written re- 
spectively: 

(!') fn = f* r 

(2') fit = {< + €,• 

for t = 1, 2, • • • , p. The quantities €,* , whose sum may be assumed zero, 
measure the constant differences among the scales. 

If the interactions of all components with Scales are computed, the analysis 
of variance extends formally, with the following separation of degrees of freedom: 


d.f. 

Mean X Scales (p — 1) 

(6) Treatments X Scales ni(p — 1) 

Error . rhiv ~ 1) 


The three lines in the analysis ph the same roles as before in relation to 
hypotheses (1') and (2'). When p > 2, however, it may be shown that the 
three sums of squares are not ciistriimted as multiples of x unless (i) all scales 
have the same error variance and (ii; every pair of scales has the same correla- 
tion coefficient. Where these conditions are reasonably well satisfied, as hap- 
pens possibly when experienced judges employ a similar scoring system, the 
above analysis supplies approximate tests. But with scales which differ widely 
in their experimental errors or in their degrees of interrcorrelation, the validity 
of variance-ratio tests is open to more serious question. 

In order to obtain an exact test, we may note that hypothesis (1') is closely 
related to the Wilks-Lawley hypothesis (Wilks [15], Lawley [9], Hsu [7]) that the 
means of k populations are all equal. If each treatment denotes a separate 
population, the Wilks-Lawley hypothesis states that 

(7) iit * (< = 1, 2, • • • , ni 4* 1). 
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Since this differs from (!') onlj'' in the interchange of the letters i and it is 
clear that the two hypotheses may be subjected to the same kind of test. 

For the details of the procedure we first divide the (p — 1) comparisons among 
scales into (p — 1) single comparisons by the introduction of a set of variates 

Pi , (f = 1, 2, • • • , p - 1). 

(8) Vi = 12 Xi/rr/ . 

j-i 

Any set of p’s may be chosen, provided that they are linearly independent and 
that 


(9) 


f: = 0, a = 1 , 2 , • . . (p - 1 )). 

J -1 


Thus with three scales we might use yi Xi -- X 2 ^ y 2 = Xi — or pi = 

2xi — X2 — Xi ,yi = X2 — Xz . 

The next step is to compute an analysis of variance and covariance of the p 
variates, as follows: 


Mean 

(10) Treatments 

Error 


d.f. 

Sum of squares 

or products 

1 

m’a 

ni 

a'ij 

ns 

Vii 


If hypothesis (1') holds, it follows from (9) that the three sets of quantities 
m'ij , tti/ and bij all follow the standard joint distribution for sums of squares 
and products. Hence Wilks’ test (Wilks [15], Pearson and Wilks [11], Hsu [7]), 
for the equality of the means of k populations may be applied. For a single test 
of hypothesis (!') we may use 


( 11 ) 


W = 


I bif + m'if + a'a | 


As before, if IT is significant we may test whether the deviation is due to constant 
differences or to other types of difference among the scales by calculating 


(12) 

II 

and 


(13) 

rrr _ l 


The flexibility of analysis of variance tests is not sacrificed; in particular we 
may test any desired subgroup of the treatments or of the scales. When there 
are only two scales the tests reduce to those given in section 3. 

The tests are invariant under homogeneous linear transformations of the p’s 



COMPARISON OF SCALES 


209 


which explains why the form of the subdivision of the scale comparisons is im- 
material. In fact for purposes of computation it is not necessary to introduce 
the y^8. By taking a simple transformation and expressing a[j in terms of a»/, 
etc., we may express W directly in terms of the a:’s, as follows; 

Z 5., 

¥ a 

(B + M + A)i/ 



where Bn , (J5 + M + A)*/ are respectively the co-factors of the matrices 
(bij), (bij + niij + a»y). Analogous expressions hold for Wm and Wt , In 
practice it will often be preferablo to compute the ^/^s in order that particular 
comparisons among the scale variates may be examined in detail. 

The form of the frequbnej' distribution has been worked out by Wilks [15]. 
For small values of Ui and p, the test of significance can be referred to the recent 
tables of the significance levels of the incomplete Beta-function, Thompson [13], 
or to variance-ratio tables. Such cases are listed below, from Wilks [15] and 
Hsu [7]. In our notation, vi is taken as (ni + 1) in equation (11), as 1 in equa- 
tion (12) and as ni in equation (13). 

P = 3, Vi > 1 : f{w) « 


*>1=1 


: F\2vi , 2(nt — 


- Dd - 


f(W) oc - TF)*'""** 


: F{p — \,nt — p] 


(n* - p)(l - W) 
{p - 1)TF ■“ 


This distribution applies to all tests made on the Mean, equation (12), and all 
cases where a single degree of freedom is isolated from the treatment comparisons. 

»'i = 2 : f{W) cc 


: F{2(p- l),2(n* - p + 2)} 


im- p + 2)(1 - TF‘) 
(p - 1)1F* 


A tabulation of the distributions for four and five scales would be useful. 
Hsu [7] has shown that as nt becomes large, the distribution of — Ws log IF tends 
to that of X* with Pi(p — 1) degrees of freedom. In general, this approximation 
does not agree very well with the exact distributions above unless rh exceeds 60. 


6. Interpretation as a problem in canonical correlations. As an introduction 
to the methods that will be used in testing the hypothesis of linearity, we may 
note that hypotheses (1') and (2') can be described in terms of canonical correla- 
tions. Fisher [5] has pointed out that the roots $ of the equation 

I an - e(aii + bii) I = 0, 


(15) 
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are the squares of the sample canonical correlations between the x-variates and 
a set of rix dummy variates which represent the rii degrees of freedom among 
treatments. In order to obtain the corresponding equation for the population 
correlations, we may suppose that Ui and p remain constant while the number 
of replicates r' and consequently 712 increase without limit. After the removal of 
a common factor r', equation (15) becomes 

(16) I f.-,- — + vctf) I = 0, 

where 

(17) = z' ({« - 'mu - 1,). 

The value of the coefficient v depends on the .type of experimental design. For 
a randomized block layout, v = Ux and for a simple group comparison v = 
(rii + 1). 

Now if hypothesis (2') is true, i.e., = f « + €, , it follows that ypa is inde- 

pendent of i and j. In this event equation (16) has (p — 1) roots p“ which are 
identically zero. The remaining root corresponds to the best discriminant func- 
tion, Fisher [5], and does not vanish unless the treatments have no effects on 
any of the x-variates. 

Let SjStXi be a population canonical variate for the scale variables. The 
coefficients Pi satisfy the equations 

(18) X + v<Jii)] = 0. i = 1, ' • • p. 

i 

Fof a zero root = 0 we have ypij = constant. Hence if a zero root is substi- 
tuted, equation (18) degenerates into 

(19) + ^2 + • • • + = 0. 

To summarize, hypothesis (2') specifies that (i) (p — 1) of the population 
canonical correlations vanish and (ii) an}" variate is a canonical scale 

variate corresponding to a zero root, provided that equation (19) is satisfied. 
Analogous results hold for hypothesis (!') ; in this case we replace the Treatments 
line of the analysis of variance by the (Treatments + Mean) line. 

6. Test for linear relationship — two scales. We may assume ni > 2; other- 
wise no test of linearity is possible. If the values of a, P and y in equations (3) 
are known, the problem can be reduced to that of testing hypothesis (1) or (2). 
Since this case is unlikely to be encountered frequently in practice, further details 
are omitted. 

When a, p and y are unknown, we may theoretically replace the variates Xx 
and X 2 by vx = axx + Px 2 and V 2 = fiiXx + M2^2 , where pi and P2 are chosen so 
that vi and V 2 are independently distributed. If hypothesis (3) holds, it follows 
from (17) that in terms of the v's, = ^12 = 0. Since in addition <712 = 0, 
the two roots of equation (16) are 

(20) p* = 0 and p^ = ^227(^22 + ^'cr22). 
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Thus hypothesis (3) implies that one of the population canonical correlations 
vanishes. Unlike the previous case, however, we cannot construct the corre- 
sponding canonical variate, which requires knowledge of a and jS. 

The selection of a sample test criterion opens up some difficulties. Pending 
further elucidation of the problem, the natural choice seems to be the square 
r\ of the lower sample canonical correlation, or the equivalent quantity = 
r\/iX — r\)y where Ih is the lower r^t of the equation: 

(21) la./- = 0. 


It appears likely, however, that r\ and ho are not sufficient estimates of the 
corresponding population parameters. 

When 712 is large, Ksu [8] has shown that the distribution of 712/12 tends to that 
of X 'vifh (711 — 1) degrees of freedom. A considerable advance towards the 
small-sample distribution is obtainable from Madow [10], who developed an 
expression for the exact distribution of r\ and r 2 when one of the population 
correlations is different from zero. In our notation this result, which is an im- 
portant generalization of the distribution found b}' Fisher [5] and Girshick [6] 
may be written as follows: 


ni — 3 


n2““3 




42r(Mi - 2) ! (n2 - 2) ! 


( 22 ) 


„ /ni + ns ni + ns ni s \ . 


X (1 - pi) V(r? - y){y -1!) 


where pi is the non- vanishing population correlation. It is evident from the 
form of (22) that the distribution of r 2 or hi involves pi . The conditional dis- 
tribution of hi/hi may be relatively insensitive to changes in pi , though even 
this distribution does not seem entirely independent of pi . 

When pi is unity, the small-sample distribution of hi is that of the ratio of two 
independent sums of squares, i.e., hi = (ni — /ni , with (tii — 1) and ni 
degrees of freedom. This result is a particular case of a more general result 
proved in section 8. From (20) it is seen that pi is close to unity when ^22 is 
large relative to 0 * 22 , i.e., when the real differences among the treatments are 
large relative to the experimental errors. In the absence of a usable exact 
solution, the F-distribution may be a better approximation than the large-sample 
distribution of hi for data where n is found to be close to unity, though proof of 
this statement is not yet available. 

If it is desired to test hypothesis (3) with the additional assumption that 
7 = 0, we replace a*/ by (a,/ + rriij) in equation (21) for h , and rii by (rii + 1) 
in the distribution theory. 


7. Connection with the method of least squares. The previous approach has 
an interesting connection with the method of least squares. We are required to 
test the linearity of relationship between (ni + 1) pairs of means (xij , xu). 
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Both variates are subject to error and the errors are correlated; with r' replica- 
tions the population variances and covariance of these means are crn/r', (722/r' 
and <ri2/r'. For these unknown quantities we have sample estimates bii/n2r', 
and hnlrhr^ respectively, derived from the Error line of the analysis of 
variance. 

The procedure suggested by the method of least sciuares is to estimate the 
parameters of the line and use the deviations of the points {xu , ^2/) from the 
line for a test of linearity. If the population variances were known, the un- 
known quantities a, /3, 7 and would be estimated by minimizing the quadratic 
form: 


ni+l ni*fl ni+ 1 

(23) <r“ E r'(xw - ku)' + 2a'- £ r'{xn - ku){xu - in) + E ’•'(X2, - Unf, 

<-l t»l <-l 

subject to the linear relations (3). Here (<7*0 is the matrix inverse to . On 
substitution of the estimates, expression (23), which is positive definite, would 
serve as a ^^sum of squares’" of deviations from the line and therefore as a test 
criterion. This criterion is of course a direct generalization of the weighted 
sum of squares which is used when the errors are independent. 

Van Uven [14] gave an elegant method hy which the sum of squares of devia- 
tions can be found directly, before solving for any of the unknown quantities 
In our notation he showed that the sum of sejuares of deviations is the smaller 
root H2 of the equation 

(24) I a,, - Han I = 0, • 


wh§re Qij is as before the treatments sum of squares or products. 

Suppose that in default of knowledge of the an we derive the weights from the 
sample estimates 6,7/712 ; i.e., we minimize (23) with 6’^ in place of o-*^, where 
(6*0 = In this case the method of Van Uven shows that the sum 

of squares of deviations from the best-fitting line is the smaller root H2 of the 
equation 


(25) 



bii 


= 0 . 


Comparing (25) with (21 ) we find H2 = 71262 . Consequently the least squares 
approach, with sample weights substituted in (23) for the unknown true weights, 
leads to 62 as a test criterion. Further, Hsu’s [8] proof that the distribution 
of 71262 tends to x“ with (ri — 1) degrees of freedom establishes for this case the 
standard least-squares result for the distribution of the residual sum of squares: 
— namely that when the population weights are known, the residual sum of 
squares is distributed as with degrees of freedom equal to the number of points, 
2(711 + 1), minus the number of independent unknowns, (tii + 3). By a trans- 
formation of the x-variates to independent variables, this result can be obtained 
alternatively from a theorem by Deming [2]. 
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8. Test for linear relationship — more than two scales. The extension of 
hypothesis (3) to the case of p scales can be expressed by means of the equations 

(3') aifii + = 7% : (t = 2, • • • p){t = 1, • • • ni + 1). 

The equations, (p — l)(ni + 1) in number, postulate a linear relation between 
Xi and every other variate and consequently imply a linear relation between 
any pair of variates Xi and xy . 

Consider the variates Vi = aiXu + ^iXu , (i = 2, • • • p). For vi we choose 
the linear function of the x’s which is independent of t^ 2 , • • • Vp , Thus in equa- 
tion (16) for the population canonical correlations we have ^,y = 0, (i, j, > 2) 
and <Tij = 0, (j > 1). It follows that all roots of equation (16) are zero except 
one, the non-vanishing root being = ^ii/(^ii + van). If each treatment 
denotes a separate population, hypothesis (3') is therefore identical with Fisher’s 
hypothesis [4], that the populations are colUnear, 

As a test criterion for this hypothesis Fisher has suggested the sum of the roots 
of equation (21), excluding the highest root, i.e., F' = S/i, = Sr,/(1 — r\). 
If 7ii > p the sum extends over (p — 1) roots, while if rii < p the sum extends 
over (ni — 1) roots. For computational purposes it may be more expeditious 
to form this sum by subtraction. Hsu [7] has pointed out that the sum of all 
roots is given by F = X) / , which is obtained readily when the inverse of 

(bij) has been calculated. The largest root of (21) is then found and subtracted 
from F. 

Fisher [4] also suggested that when equations (3') hold, the distribution of 
F' is approximately that of x with (p — l)(ni — 1) degrees of freedom. This 
result has been confirmed by Hsu [8] as the limiting form of the F' distribution 
when 712 tends to infinity. As in the case of two scales, the small-sample distri- 
bution is as yet unknown; it presumably contains pi , the non- vanishing correla- 
tion, as a nuisance parameter. 

Some progress towards the small-sample distribution can be made without 
difficulty in the case where pi = 1. For then Vi must have a zero Error sum 
of squares in every sample from the population, i.e., Vi is constant within any 
given treatment. Consequently (i) hu = 0 for t = 1, • * • p, and (ii) Uiy/au 
is a single degree of freedom from the Treatments sum of squares of Vj . On 
account of conditions (i), equation (21) reduces to 



an 

Ul2 

• • • Clip 


(26) 

ai2 

022 — hb22 

• • • 02p hb2p 

= 0. 


aip 

0/2p hb2p 

• • • cipp hbpp 


Subtract au/an times the first row from the ith row, for i = 

2, • • • p. We see 


that one root is infinite; the rest are the roots of the equation 
(27) |ai' - hbiil = 0, 

where a^y = a^y — anan/au . 


i, y = 2, • • • p, 
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If hypothesis ( 3 ') holds, the quantities a"j follow the Wishart distribution [ 16 ] 
with (ni — 1) degrees of freedom. Hence the joint distribution of h, hp 
or hni , is that which is obtained when all the population canonical correlations 
vanish, with (ni — 1) in place of ni . For rii > p, the distribution function 
(apart from the constant term) is: 

( 28 ) jft [^/ip>-'-»(l + (fu - /i,)|] . 

For two scales, (p — 2 ), we reach the result mentioned in section 6, that V' = h* 
is distributed as (ni — l)e’‘‘/n2 . This result can also be obtained directly from 
( 27 ). When p = 3 , the distribution of V' is obtainable from a result by Hsu [ 7 ]. 


9 . Measures of relative sensitivity. We propose to discuss briefly the esti- 
mation of the relative sensitivity of two scales and to indicate the types of 
distribution that are involved. If there are only two treatments, t, t', an ap- 
propriate definition of the true sensitivity of the fth scale is 


( 29 ) 


(f.v - {«)* 


2<r<, 




or some simple function of this quantity. In justification, we may observe 
that for a fixed number of replicates, the power function of the <-test in the ith 
scale depends entirely on this quantity. An unbiased sample estimate is 

26 ;; 

wheYe r' is the number of replicates. Since ( 30 ) involves a non-central variance 
ratio, confidence limits for the true sensitivity can be found from Fisher^s Type 
C distribution, Fisher [ 3 ]. 

It follows from ( 3 ) and ( 29 ) that if two scales are linearly related (including 
the case of equivalence) their relative sensitivity is constant for all treatment 
comparisons. For scale 1 relative to scale 2 the sensitivity is measured by 

^ < 722 / cia\i • 

If the scales are equivalent, apart possibly from a constant difference, this 
quantity reduces to ^ , for which F = 622/611 serves as a sample estimate. 

A test of significance of the sample ratio and confidence limits for the true ratio 
may be obtained from Pitman [ 12 ], who showed that 


( 31 ) 




<P 


follows the distribution of a sample correlation coefficient from (n2 -f- 1) pairs 
of observations. In ( 31 ), rh = ba/bubn . The same procedure may be used 
whenever a and are known. 

When a and /8 are unknown, a sample estimate of the relative sensitivity is 
6*622/0*611 , where (axi 4- 6x2) is the discriminant function which corresponds to 
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the lower root of equation (21). We have not been able to reach the distribution 
of this estimate. Confidence limits for the relative sensitivity can, however, 
be obtained when is sufficiently large so that an and <122 may be assumed known. 
For in that case the problem reduces to that of finding confidence limits for 
Now if a, are the true coefficients, the quantity 

/qon 0 ? CLii 4“ 2agat2 + 18^022 

0^611 + 2 a 0 bn + i3*fe22 ’ 


folloAvs the jni distribution. Any proposed values of a and which make 
(32) significant are rejected by the evidence of the sample. By equating (32) 
to the desired significance level of riic^V^ > we get a quadratic equation for the 
two limits of jS/a. The limits will not be narrow unless the treatment effects 
are large. 

If the relation between the scales is non-linear, and the assumption of a con- 
stant error variance throughout an individual scale is valid, the relative sensi- 
tivity differs for different treatment comparisons. Even in this event estimates 
of relative sensitivity may be of interest. Attention might be restricted to a 
single degree of freedom from the treatment comparisons, in which case the 
definition for two treatments could be applied. 

Alternatively an estimate might be wanted of the average relative sensitivity 
over all treatment comparisons. For a given number of replicates, the power 
function of the variance-ratio test of the treatment effects in the ith scale de- 
pends only on the quantity 


(33) 


Z - u 

t 


Consequently this quantity, which is an extension of (29), might be chosen as 
a measure of average sensitivity. The corresponding generalization of the 
unbiased sample estimate (20) is 


(34) 


(nt - 2)o.-. - _ ^ 
niv'bii r'" 


Since the quantity au/hu is a multiple of a non-central variance ratio, the com- 
parison of two scales involves a test of significance of the hypothesis that two 
non-central variance ratios are equal. 


10. Summary. This paper discusses the analysis of data obtained when the 
results of a replicated experiment are measured on several different scales which 
we wish to compare. Recent work in multivariate analysis provides tests of 
the hypothesis that the treatment effects are the same in all scales, and of the 
hypothesis that the scales are linearly related. When the number of Error 
degrees of freedom is large, the significance levels of these tests are obtainable 
from the standard tables. For small sample tests, further investigation and 



216 


W. G. COCHRAN 


tabulation of certain distributions will be needed, particularly that of the sample 
canonical correlations when one population correlation differs from zero. 

A brief discussion is given of methods for comparing the relative sensitivity of 
two scales. 
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ON STOCHASTIC LIMIT AND ORDER RELATIONSHIPS 


By H. B. Mann and A. Wald^ 

Columbia University 

1. Introduction. The concept of a stochastic limit is frequently used in 
statistical literature. Writers of papers on problems in statistics and probability 
usually prove only those special cases of more general theorems which are neces- 
sary for the solution of their particular problems. Thus readers of statistical 
papers are confronted with the necessity of laboriously ploughing through de- 
tails, a task which is made more difficult by the fact that no uniform notation 
has as yet been introduced. It is therefore the purpose of the present paper to 
outline a systematic theory of stochastic limit and order relationships and at the 
same time to propose a convenient notation analogous to the notation of ordinary 
limit and order relationships. The theorems derived in this paper are of a more 
general nature and seem to contain to the authors' knowledge all previous results 
in the literature. For instance the so-called 6-method for the derivation of 
asymptotic standard deviations and limit distributions, also two lemmas by 
J. L. Doob [1] on products, sums and quotients of random variables and a 
theorem derived by W. G. Madow [2] are special cases of our results. It Is hoped 
that such a general theory together wth a convenient notation will considerably 
facilitate the derivation of theorems concerning stochastic limits and limit dis- 
tributions. In section 2 we define the notion of convergence in probability and 
that of stochastic order and derive 5 theorems of a very general nature. Sec- 
tion 2 contains 2 corollaries of these general theorems which have so far been 
most important in applications. 

We shall frequently need the concept of a vector. A vector a = (a^, • • • , a*") 
is an ordered set of r numbers a\ • • • , The numbers a\ • • • , a*" are called the 
components of a. If the components are random variables then the vector is 
called a random vector. 

We shall generally denote by a, b constant vectors by x, y random vectors and 
by a\ • • • , a\ their components. Differing from the usual practice we 

shall put I a I = ( | ‘ > i we shall write a < f) or a < 6 if a* < 6* 

or a* < b* for every i. This notation saves a great amount of writing, since all 
our theorems except theorem 4 are valid for sequences of any number of jointly 
distributed variates. 

We shall review here the ordinary order notation. In all that follows let /(A) 
be a positive function defined for all positive integers N. 

^ Research under a grant-in-aid from the Carnegie Corporation of New York. 
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We write 

Oif - o[f(N)] if to ait/SQf) =» 0. 

at, = 0\S{N)] if I I < MS{N) for aU N and a fixed M > 0. 

at, = Q\fiN)] if 0 < M'fiN) < | | < MfiN) for almost aU N and for 

two fixed numbers M > M' > 0. 
at, — «[/(Ar)] if 0 < Mf{N) < | Ojr | for almost all N and a fixed ilf > 0. 

For instance, jog N = o{N') for every « > 0, or sin N/N = 0(1/N), 3 + 4* 

W/(4 + 8 VN) = Q(VN) 5/sin N = «(1). 

For any statement V we shall denote by P(F) the probability that V holds. 

2. General theorems on stochastic limit and order relationships. 

Definition 1. We write plim xt, = 0. (In words xt, converges in probability to 

0 mth increasing N) if for every € > 0 lim jP( | | < e) = 1. Further plim 

Xs ^ X if plim (xy — rc) = 0. 

^-♦ao 

Definition 2. We write xk = Op\f(N)] (x„ is of probability order o\f(N)]) if 
plim Xsff(N) = 0. 

AT -*00 

Definition 3. We write Xn = Op[f{N)] {xn is of probability order 0[f{N)]) if 
for each € > 0 there exists an > 0 such that P{ | | < Atf{N)) >1 — 6 for 

all values of N. 

Definition 4. xs = ^p[f(^)] if fo^ € > 0 there exist two numbers At 
> 0 and Bt > 0 and an integer Nt such that P[Atf{N) < 1 | < Btf{N)] > 

1 — € /or all N > Nt . 

Definition 5. Xn = Wp[/(iV)] if for every c > 0 there exists an At > 0 and an 
integer Nt such that P[AtfiN) < 1 | ] > 1 — c/or all N > Nt . 

Let E denote a vector space. For any subset £" of E the symbol a CZ E' will 
mean that a is an element of E\ 

Since P{x CZ Ei & x Cl E 2 ) > P{x CZ Ei) — P{x cj: E 2 ) we evidently have 
Lemma 1. If P{x d Ei) > 1 - P(x C JFj) > 1 - e', then P{x C Ei ; 
x a E 2 ) > \ - e - 

We now put 0^ ===0,0^ = 0, 0^ = = w. 

Theorem 1. For evei'y t > 0 let {iBAr(€)} be a sequence of subsets of the r-di- 
mensional Cartesian space such that P{xir CZ Rsie)) > 1 — c/or all N greater than 
a certain integer Nt . Let } be a sequence of functions of x ^ a;*, • • • x'") 

such that gx((iy) = 0’[/(^)] for any c > 0 and for any sequence {oat} for which 
Off C Rffie). Then we have gs^xs) = Ol)[/(iV')]. 

Proof: For i = 1, 2, 3, there exists a positive integer Nt such that | fifAr(a) | is 
a bounded function of a in Rs^e) for N > For otherwise we could construct 
a sequence {oat} with an in Rsie) such that | gnidn) | > Mf{N) for any M and 
for infinitely many values of N which contradicts the hypothesis of our theorem. 
Hence there exists an Nt such that for N > Nt the function | gsijoi) | is bounded 
in Rnit), Let Mnie) be the l.u.b. of | gnia) | /f{N) in iZAr(c). We can construct 
a sequence {oArj with an CZ Rn{t) such that | gni^s) | /f{N) > Mn{€)/2 for all 
N > Nt . Hence for f = 2, 3 the sequence Mnie) must be bounded and for 
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t = 1 we must have lim Mn{t) =» 0. Let M (e) be the l.u.b. of For 

AT— *00 

t = 3, 4 one shows in exactly the same manner the existence of a g.l.b. M{() of 
I I /f(N) if a C Rtiie) and for N > N', . Hence for sufficiently large N 
we have 


P[ 1 gAxs) 1 < M>,(()f(,N)] > 1 - e with lim = 0 for i = 1, 

JV-*oo 


P[ 1 gM 1 < M(t)fm > 1 - * 

for i = 2, 

(e)/(iV) < 1 g„{x^) 1 < M{e)f(N)] > 1 - * 

for i = 3, 

P[M(«)/(iV) < 1 1 1 > 1 - « 

for i = 4. 


For i = 2 the existence of an M'(€) such that P[ | 1 < M'(€)f{N)] > 1 — € 

for all N follows easily from this result. Hence our theorem is proved. 

Corollary 1. If Xy = Ol/[fj(N)] for j = 1, 2, • • • , r and {^jv(6)} is a se- 
quence of subsets of the k-dimensional space such that P[yy C iJjv(e)] > 

1 — tfor sufficiently large N, and if * x**, y^, • • • y ^,) } is a sequence 

of functions of a:\ t/S y\ • * • y^ such that for any e> Owe hme gy(ay , by) 

= 0*[f(N)] for every sequence {ajyr, by] with ay = 0*' [/iCAT)] (j = 1, 2, • • • , r) and 
by C RnU), then gy{xy , yy) = Op[fiN)l 

Proof: It follows from Lemma 1, the definition of the relation Xy = OV [/j(A^)] 
and the hypothesis of our corollary that for any e > 0 there exists a sequence 
of subsets {/?jv(€)} of the space • • • , x'", 2/S • • • , j/*' which satisfies the condi- 
tions of Theorem 1 with respect to the sequence of functions {^.v}. Hence 
Corollary 1 is an immediate consequence of Theorem 1. 

Corollary 1 implies inter alia that all operational rules for the ordinary order 
and limit relations are also applicable to stochastic limit and order relations. 
For instance o{f{N)]/Q [g{N)] = o\f{N)/g{N)l Hence also o^\f{N)]/QMN)] = 
o,U(N)/g{N)l 

Definition 6. For any N let Ry be a region^ fsio) d function defined on Ry. 
The sequence {fs{a)] will be said to be uniformly continuous with respect to [Ry] if 
the following condition is fulfilled. For every € > 0 there exists a vector 5 > 0 
such that for almost all N 

\fN{a + 5) — fy(a) I < € for any | S | <5, and for any a CZRy 

Theorem 2. Let plim {xy — yy) = 0. For every ^ > Olet [Ry{€)\ be a se- 

quence of subsets of the r-dimensional vector space such that for almost all N we have 
P[yy C Rf/U)] >!’-€, If the sequence of functions {/jv(a)} is uniformly con- 
tinuous with respect to {/2jv(€)} for every € > 0, then plim [fsixy) — hiyy)] = 0. 

Pboof: We have /w(xAr) — hiyif) = Uiyif + Zit) — hiyx) where z’^ = o(l) 
for j = 1, • • • , r. Because of the uniform continuity of fit(a) with respect to 
Rnif) we see that for every sequence {o# , with Ok C Rxit) and b’s = o(l) 
O’ “ 1, 2, • • • , r). 


f/tifiLn + bft) — — o(l) , 
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Hence Theorem 2 follows from Corollary 1. 

In the following we shall abbreviate “cumulative distribution function” by d.f . 

Definition 7. Let {ajAr} he a sequence of random variables. Let Fy he the d.f. 

of xs . Let X have the distribution F. We shall write doo (xat) = d{x) if lira F^ 

2^ -*00 

= F in every continuity point of F. 

Theorem 3. Let plim {xs — yif) = 0 and d^{yN) = d{y)] then d^{xif) = 
d(y). 

Proof: Let Gn , Fs be the d.f.’s of Xn , yn resp. For any 5 > 0 we have 
P(3In < o + 5) > P(Xiv < a; 2/jv < a + 3) > Pixy < o; | - Xat | < 3) 

> P{xn < o) - P( I yAT - xjir I > 3), 
P{xti < o) > P{xs < a; y^ < a — S) > Piy^ < o — 3) 

- P( I Xat - j/at I > 3). 

Hence since P(yAr < a) = Fsia), P(xk < a) = Gi,(a), lim P ( | Xat — j/at | >3) = 

AT— *00 

0 we have lim. sup. F^ia + 5) > lim. sup. Gj^(a) > lim. inf. G^ia) > lim. inf. 
F if(a — 6). 

If a + 8 and a — 5 are continuity points of F we have 

F(a + 5) > lim. sup. Gs{a) > lim. inf. Gif(a) > F(a — 8) . 

For any 3o > 0 there exists a positive 8 < 8o such that a — 6 and a + 6 are 
continuity points of F, Hence we can choose 8 arbitrarily small and if a is a 
continuity point of F we must have 

lim. Gif (a) = F{a). 

Theorem 4. Let Xn , yN he two sequences of one-dimensional vectors and let 
plim {xn — ys) == 0. Let Fs , Gs he the cumulative distribution functions of 

AT -*00 

Xn and ys respectively. Let Rn{^) he the set of points a for which | FatCu) — GN(a) | 
> €. Let Mn{€) be the Lebesgue measure of this set. Then lim MatCc) = 0 

for every e > 0. 

We first prove the following lemma. 

Lemma 2. Let 8, € be any arbitrary positive numbers and let f he a distribution 
function. The set of points a for which f {a + 8) — f{a) > e has at most the Lebesgue 
measure 8/e. 

Proof: The points a for which /(a + 5) — /(a) > « must have a lower bound a. 
Otherwise we could find infinitely many such points whose distance from each 
other is more than 8. But this contradicts the requirement that f(^) == 1. 
liCt ai be the g.l.b. of the o’s. Then for any ri > 0 in the interval (ai < x < 
+ 5 + ^) the value of F increases at least by the amount c. Let now 02 be 
the g.l.b. of the a’s outside of this interval. We continue our construction by 
constructing the interval {a 2 <x<a 2 '\-S + ri) and so forth. But after at most 
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l/e such steps the construction must stop. Hence all points a for which /(a + 5) 
/(ct) ^ € are contained in at most 1/c intervals of length 8 + rj. Hence since 
7j was arbitrary the Ijebcsgue measure of this set is at most 8/ e. 

We come now to the proof of our theorem. We have 

P{xff < a) > Pipes < a]ys < a + 8) > P{xs < a) - P( 1 | > 5), 

Piys < a + 8) > P{xs < a; t/y < a + 5) > Piys < a + 8) 

— P( I Xy — ys\ > 8) — P(a ^ Xs ^ 0 . +26). 

Therefore 

Pixs ^ u; 2 /y a + 6) = Pixs ^ cl) — 6sP{ \xs ys \ 8) 

“ Piyjf ^ + 6) — BsP{ \xs — ys \ ^ B) “ SsPia ^ Xiv ^ a + 26), 

where 0 < ds < 1, 0 < ds < I* Hence 

P(2/y ^ + 6) = PQcs ^ fit) + OsPi I 2/a^ I > 6) 

+ ^y[Py(a + 26) — Py(a)] 

where | ^y |, | ^y | < 1. 

By hypothesis we have P(\xs -- ys \ > l/m) < \/m for almost all N and 
every integer m. Hence we can choose a sequence {6y} with 6Ar > 0 in such a 
way that lim 8s = 0, lim Pi\xs — ys | > ^at) = 0. We can then choose iNT* 

y-*oo y-*oo « 

SO that P( I — T/iv I > 6Ar) < e/3 for N > Nt . Applying Lemma 2 we see 
that except for a set of measure at most 6 8s/ ^ we have FsicL + 26y) — Py(a) < 
e/3. Similarly the set of points for which gs{a + 6Ar) — gN{o) > e/3 has at most 
the Jjebesgue measure 3 8s/ Hence, except in a set of points whose measure 
is at most 9 6Ar/e, we have 

I Gs{a) - Fsia) [ < e, 

and this completes the proof of Theorem 4. 

Theorem 4a. Let plim {xs — ys) = 0. Let Fs , Gy he the distribution June- 

y-*oo 

Hons of Xs , ys respectively. Furthermore, let Rs{^) he the set of points inside an 
r-dimensional cube where | Py — Gat 1 > e and let M sii) he the Lebesgue measure 
of Rs(e), then lim Msi^) = 0. 

y-»oo 

We prove first 

Lemma 2a. Let 8 = (6\ 6^ • • • , 6^ > 0 and max, 6* = d. Let I be the cube 
defined by (—A < x^ < A, i = 1, 2, • • • r). Let furthermore f be a d,f. Then 
the Lebesgue measure of the points a in I for which f {a + 6) — /(a) > €is at most 
dr^ Af V 

Proof: Let /i(a:^), Uio^), • • • frix) be the marginal distributions of • • • 
respectively. It follows from Lemma 2 that the linear Lebesgue measure of 
those numbers 6/ for which /i(a* + 6*) — /»(a*) > c/r is smaller than rd/e. We 
form the set (a;" = a* & x C 1) for every such cl and for t == 1, 2, • • • r. The 
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Lebesgue measure of the sum /?(€) of all these sets is at most We 

shall show that R{€) contains all points a inside I for which /(o + 5) — /(«) 5: «• 
We have 


/(a^ + a* + 6^, • • • , a' + 5") — /(a\ • • • aO = Ai + A 2 + • • • + Ar , 

where A* = /(a\ a*, • • • a*“\ a* + 6*, • • • a' + O — /(a\ • • • «*, • • • 

+ 50* If + 5) — /(a) > € then we must have for at least one i 


Ai > e/r . 


But A* is the probability of a subset of the set T = {a* < x* < o' + 5*) and 
fi{a* + d') — fi{a*) is the probability of T itself. Hence 

6/r < A, < Ma' + 6') - Ma% 


and if (a^, a^, • • • a’’) is in I then it is contained in /2(c). Hence Lemma 2a is 
proved. 

The proof of Theorem 4a using Lemma 2a is similar to that of Theorem 4 and 
therefore it is omitted. 

The Jordan measure of a set /2 with respect to the distribution function F is 
defined as follows. We consider only intervals whose boundary points are 
continuity points of F. We cover R with the sum / of a finite number of inter- 
vals. (The intervals themselves may also be infinite. For instance the sets 
a < X < ^ , a < X < 00 are also considered intervals.) We consider M{I) = 

j dF for every I covering R. The g.l.b. of all such M{I) is called the exterior 

Jordan measure M{R) of R. Similarly we consider all sums 7 of a finite number 

of intervals which are contained in /2. The l.u.b. of / dF is called the interior 

Jordan measure M(R) of /2. If M(R) = M(R) then M(R) is called the Jordan 
measure of R. 

Lemma 3. Let Fn{x) be a sequence of d,f,^s such that lim Fsix) == F(x) in every 

N-*ao 


continuity point of F{x). Let h{x) be a bounded function such that the discontinuity 


+«0 


points of h{x) have the Jordan measure 0 vyith respect to F and such that h(x) dFy 

p +00 * +00 M +00 

(x) and I h(x) dF(x) exist. Then lim / h{x) dFjf{x) — I h(x) dF(x). 

J—ao Ar*~*oo *^00 J-^co 


Pboof: There is only an enumerable set of hyperplanes parallel to the plane 
x’ = 0 which have positive probability with respect to F, Hence we can find 
for every 8 an interval net whose cells have a diameter at most 8 and such that 
the boundary points of every cell are continuity points of F. 

We first determine a closed finite interval I such that / dF(x) >1 — 4 and 

J I 2 

such that the boundary points of I are continuity points of F. We further 
determine a sum I' of a finite ntunber of open intervals such that I’ contains all 

discontinuity points of h, I dF(x) < 4, and such that the boundary of I' does 

Jj* z 
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not contain any discontinuity points of F. All this is possible by hypothesis 
and because the set of hyperplanes with positive probability is enumerable. 
Let R be the subset of I consisting of all points of I which are not contained in 
72 is a closed set and can be decomposed into a finite number of intervals. The 
function h is continuous in 72 and therefore uniformly continuous. We can 
therefore cover 72 by a finite set of intervals such that the variation of h in every 
interval is less than € and such that the boundary points of each interval are 
continuity points of F. Let Ii , h , • • • hhe such a finite set of intervals. Let 
xy be any point in /,■ . We have 


I I = I f Hx) dFtfix) -f h(x) dF(x) = E f lh(x) - h(xi)] dFs(x) 

I J—oo J—aO y — 1 •'ff 

- E [A(a:) - h(xi)]dF(x) + E dFs(x) - ^ dF(z^j 

+ f h(x) dF^(x) - f Hx) dF(x) 

Jx4R I 

E/i(x,)[j^ dF^(z) - dF(z)] 


<e + ( + 


+ max. /i(x) J dF k(x) + € J . 


But lim [ dFH^) > 1 — Hence 


lim. sup. Hif < 2e + 2« max. /i(x) . 

Since c was arbitrary, we must have lim Hm = 0. 

We are now prepared to prove 

Theorem 5. Let d<»(xy) = d(x). Let g{x) he a Borel measurable function 
such that the set R of discontinuity points of g{x) is closed and P(x C /?) = 0. 
Then doo[g(xjf}] = d[j;(x)]. 

Proof: Let Fy be the d.f. oi Xk , F the d.f. of x, F^g, F, the d.f.’s of g(xi,), 
gix) resp. Then lim Fjv = F in every cont. point of F. Let h{x) be defined 

3r-*oo 

as follows: 


hix) = 1 if g{x) < a , 
h(x) == 0 if g{x) > a . 

The discontinuities of h are contained in the set M of all points where g{x) = a 
and is continuous or where g(x) is discontinuous. The set R of discontinuity 
points of g(x) is closed and of measure 0 with respect to F. We can therefore 
subtract from M a sum R* of a finite number of open intervals of arbitrarily 
small measure with respect to F which contains all discontinuity points of g(x). 
This difference set M' is closed and contains only points where g(x) = a and 
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X dp. R. If a is a continuity point of Fg then the Borel measure of M' with respect 
to F is 0. Since M' is closed, its Jordan measure is also 0. Hence the Jordan 
measure of the discontinuity points of h(x) is 0 if a is a continuity point of Fg . 

h{x)dFii{x) = Fjfg(a) and / h(x)dF{x) 

CO J—00 

= Fg(a) exist for every a. Hence by Lemma 3 lim F^gia) = Fg{a) in every 

JV-*oo 

continuity point of Fg and this proves our theorem. 

3. Corollaries and applications. Corollaby 2. 7/ plim {xn — yn) = 0, 

N-*oo 

d^iVs) = d{y) and if f is continuous except in a set R for which lim Piy^ C R) 

IV -♦00 

= 0 then plim/fiAr) — /(y^r) = 0. 

JV-»oo 

Proof: Let 7 be a closed interval such that P(yN C 7) > 1 — e/2, Let 7' be 
a sum of open intervals containing all discontinuity points of f(x) in 7 and such 
that P(yN Cl 7') < e/2 for suflSciently large N, The set J of points of 7 which 
are not points of 7' is a closed set. Hence / is uniformly continuous in J and 
C J) > 1 — € for sufficiently large N, In Theorem 2 we put Rsie) = J, 
fs f> Then all conditions of Theorem 2 are satisfied and it follows that plim 

UM - /M] = 0. 

If, moreover, the set of discontinuity points of / is closed then by Theorems 
3 and 5 d 00 [/(a;Ar)] = doo[f(yj^)] == d[/( 2 /)]. 

Special cases of Corollary 2 have been proved by J. L. Doob and W. G. 
Madow (2). 

Tl\eorem 5 is very useful in deriving limit distributions. 

It follows for instance from Theorem 5 that if d^{xN) = d{x)^ d^{yN) = 

d(i/), where .r, y are independently and normally distributed with mean 0 and 

equal variances, then d^{xs/yN) = d{x/y). That is to say the distribution of 

Xs/yN converges to a Cauchy distribution. 

It also follows from Theorem 5 that under very general conditions the limit 

distribution of < = \/ N{x — n)/s is normal, (x = sample mean, /x = population 

mean, ^ = sample variance.) For we have under very general conditions d 

\/N{x — h) = d(f),plim s = <r, where ^ is normally distributed with vari- 
2 

ance a . 

Appl 3 dng Theorem 6 it can also easily be shown that under very general 
conditions the limit distribution of is a chi-square distribution if the means 
of all variates are 0. Hotelling’s (the generalized Student ratio) for a 
p-variate distribution is defined as follows: 

= N a, where || = || 

»-l 7-1 

where s^ is the sample covariance between x* and x\ 

We have doo (A^) ^d(cr’0, where || (nj H'" = || ||. If Eix') = 0 for / = 

1,2, • • • p then d oo (\/ N f.) = d(rji) where the rji have a joint normal distribution 
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with covariance matrix || <r</ll . Hence 

where the i/J are normally and independently distributed with variance 1. 
Hence the distribution of converges to a chi-square distribution with p degrees 
of freedom. 

If the samples are drawn from a sequence populations {ttat} all with the 
same covariance matrix and such that lim \/ Nfus = where /i,Ar is the mean 

JV-*oo 

value of the ith variate in the Nth population, then one sees in exactly the same 
way that the limit distribution of is a non-central square distribution with 
p degrees of freedom. 

The limit distribution of has been derived by W. G. Madow (2). 
Corollary 3. Let Xn , he r-dimermonal vectors d^iy^) = d{y) and Xs — 
yN = Op[f{N)] with lim f{N) = 0. Let g{x) be a function admitting continuous 

S-*<c 

jth derivatives except in a set R with lim P{yN Cl R) =0. Let 


Tjix, a) 



(x' - a') + ••• + 




then 


gM - giy^) - Tjix„,yK) = Op{[/(JV)]'}. 


Since the jth derivatives are continuous except in a set of limit measure 0 
we can determine a closed set /2(e) on which they are uniformly continuous and 
so that PiyN Cl /2(e)) > 1 — e for sufficiently large N, Then for every sequence 
with as — hjf — bjf C /2(e) we have 

gM - y(M ~ T,{a^,bs) = o[f(Ny ] . 


Hence Corollary 3 follows from Theorem 1. 

Corollary 3 was first proved by W. G. Madow [2] and J. L. Doob [1] for the 
important case that 2/j\r is a constant. 

The following example will illustrate Corollary 3. Let x, y be normally and 
independently distributed random variables with mean 0 and variance 1; 

{z^} sequences of random variables with plim y/N Zs = plim y/ N Zn = L 

N-*cc N-*co 

Let Xn = X + Zif , yx = y + z'k • We consider the function g{x, y) — x’/S + 
J/V3 + 2x — 2j/ + 5. Applying Coroltoy 1 it is easy to v^y that g{xK , yn) — 
g(x, y) = fl,[l/\/iV], Zn = OpCl/VW), z's = 0,(l/\/iV). Hence applying 
Corollary 3 for j = 1 we have 

g{xit , Vn) - g{x, y) - (x* + 2)zk - (y® - 2)zs = o,(l/\/W) . 
Multiplying by -s/W we have 

[gixs , y„) - g(x, y)] Vn - [(x® + 2)z^ + (y® - 2)zs\ VN = Op(l) . 
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This is equivalent to 

plim[\/?^(ff(*^r , yif) - g{x, j/))] = X* + j/* . 

Ar-*oo 


Hence the distribution of \^N{g(xif , j/y) — g{x, y)) converges to the chi-square 
distribution with 2 degrees of freedom. 

If plim Xff — a and {orjv^} is a sequence of numbers with lim <riv = 0 such that 

AT ■-♦00 


doo[(xJr — a*)/<rAr] = (/({<) where the are constants or random variables and 
if g admits continuous first derivatives at x = a at least one of which is different 

from 0, then putting ( ^ ) = gi , we have 

Voxv-e-o 


gixs) — g(a) = sr,(a;y - a*) + • • • + gr(xy — a') + o,{arif) . 


Hence applying Theorems 3 and 5 we have 

(i) doo ] = d(j7i{i + • . • + gr^r). 

That is to say the distribution of [gixs) — gW]/<^y converges to the distribution 
of ^ gi^i in all continuity points of the latter. A corresponding result can be 

t-i 

obtained from Corollary 3 if all first derivatives are 0 at x = a and at least 
one second derivative is different from 0 and so forth. 

A method of deriving limiting distributions and limit standard deviations based 
on (i) is known as the 5-method and has been extensively applied in statistical 
literature. 
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ON A MEASURE PROBLEM ARISING IN THE THEORY OF 
NON-PARAMETRIC TESTS 


By Henry ScHEFFfi 
Princeton University 

1. Introduction* While the contents of this paper have broader statistical 
implications, they were motivated by the following problem : Given two samples, 
(Fi , 72 , • • • , Ym) and (Zi , Z 2 , • • • , Z«) from univariate populations with 
cumulative distribution functions (c.di’s) F{x) and G{x), respectively, and 
given furthermore that F and G are members of a certain class 12 of c.d.f^s, to 
test the hypothesis that F — G. We shall refer to this as “the problem of two 
samples’’ [8]. It is an example of what Wolfowitz has called problems of the 
non-parametric case [8]. 

For the theory of non-parametric problems the following classification of 
c.di’s is appropriate: Let 12o be the class of all univariate c,d,fs, that is, the class 
of all monotone non-decreasing functions F{x) for which F(— x) = 0, 
F(+oo) = 1 , and F(x) = F(x + 0). For every F €l 2 o we may conceive of a 
corresponding random variable X such that Pr{X < a:} = F(x), For some 
purposes we may desire to laile out the class 12 ^^^ of degenerate c.d.f ’s given by the 
formula F{x) = 0 for a: < Xo, F{x) = 1 for a: > a:o , where xo is any real number. 
Let then 12 i be the class of non-degenerate c.d.f s, 12i = 12© — 12 ^°\ Let 122 be the class 
of all continuous F(x), and let Qz be the cUiss of all absolutely continuous F(x), 
that is, all F(x) for which there exists a probability density function (p.d.f.) 
f{x) such that 

(1) F{x) = f m 

•^00 

Finally, let Qtbethe class of all F{x) which may he expressed in the form ( 1 ) with 
f{x) continuous. 

Various solutions of non-parametric problems have been given under the 
restriction that the c.d.f’s belong to one of the classes . For example, Kol- 
mogoroff [2] has indicated how a confidence belt for an unknown F may be 
formed with no assumptions on F, that is F c Ho . Wald and Wolfowitz earlier* 
gave a more general solution of the same problem [ 5 ], and also of the problem 
of two samples [ 6 ], under the restriction that the c.di’s are members of Qj . 
The latter problem was considered by Dixon [ 1 ] for the c.di’s in Ut . Wilks’ 
theory of tolerance intervals [7] assumes F e O 4 . The class has been defined 
above because it is ordinarily the hugest class of statistical interest. We note 

(2) ^ ^ ^ fit ^ fit • 

> See, however, a still earlier paper by Kolmogoroff [11] in which he gave the distri- 
bution theory required for his solution. 
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It is to be understood throughout that the word “region” (also the symbol w) 
always denotes a Bor el set in a k-dimensional (fc > 1) sample space W (Euclidean). 
A “mill set” will always mean a Borel set of measure zero. 

Returning now to the problem of two samples, let m + n = fc, X,- = Yt 
(t = 1, 2, • • • , m), Xi = Zi-m (i = m + 1, k). Denote by E the point 

(Xi , • • • , Xk). Proceeding along the lines of the usual parametric theory, 
we may seek a region w (the “critical region”) such that Pr{E « to} is the same 
constant a (“significance level”; a 5^ 0 or 1) for all F in a particular class fl,- 
U F = G. This raises the following question: Define 

Piw I F) = f dFkixi , • • • , xt), 

*' w 

where 

Fjfc(xi , • • • , xt) = II F(x,). 

j-i 

We shall say that a region w has the property Ti if for all F eUi y a = P(w | F) 
is independent of F and 0 < a < 1. The question then is, for a fixed i, how 
can we characterize regions w with the property tt,? Partial answers to this 
question are given in the next section. 

In the language of measure theory the question is this: Let m be any measure 
on the real line, such that the measure of the whole line is unity, and form the 
‘‘power” measure ii in Euclidean A;-space — ^that is, the product measure obtained 
by using m on each axis. For certain large classes C< (corresponding to the 12* 
defined above, i = 1, 2, 3, 4) of measures /x, what can we say about the existence 
and structure of sets of points in the jfc-space which have the property that 
their “power” measure is the same for all measures m in C,? 

2. Theorems. Our first theorem tells us that if we want regions w with the 
desired property, we must restrict F to a smaller class than Qi . 

Theorem 1 : There is no w with the property tti . 

To prove the theorem, supp)Ose the contrary. Then there exists a w for which 
P(w I F) = a for all F € i2i and a 0 or 1. Let L be the line Xi = X2= • • • Xk , 
and suppose first there is a point Eo of L in w. Jjet Eo = (a, a, • • • , a), and 
let Fh(x) be any F eQi such that Pr{X = a | Fa} = h (0 < h < 1), Then 

a = P{w\ Fa) > P{Eo \ Fa) = Pr{aU X,- = a | Fa} 

= n Pr{Xi = a I F*} = h\ 

By hypothesis a is independent of h. But h may be chosen arbitrarily close to 1 . 
Hence a = 1, a contradiction. If no points of w lie on L, the above reasoning 
applies tov/ = W — w, since a' = P(v/ j F) = 1 — a is independent of F t Qi , 
and w' contains an Eo on L, therefore a' = 1, a = 0. 
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In order to see what kind of structure might yield a w; of the desired type, 
let us for the moment consider the class Q3 of c.d.f’s. Then there exists a p.d.f. 
over W, namely /(xO/fe) • • • For any /(a:) and any point^ E, this p.d.f. 

has the same value at all points E' whose coordinates are permutations of the 
coordinates of E, This suggests that suitable regions w can Ixj built up by 
considering points E for which no two coordinates are cciual and putting a fixed 
fraction of the set {£?'} in w in such a way that ly is a Borel set. Our next 
theorem justifies this process for the wider class U 2 . 

Let us say that w has the structure S if for every point E = (xi , • • • , Xk) with 
no two coordinates equals M points {Q < M < kl) of the set 1^'}, obtained by 
permuting the coordinates of E, are in w and the remaining k\ — M are not? 

Theorem 2: A sufficient condition that w have the property 7r2 is that it have the 
structure S, 

In proving the theorem it will be convenient to separate the A:! points of 
every set | JS'} by means of regions Ut(i = 1, * • • , kl), such that each Ut contains 
one and only one point of {£/'}. Order the fc! permutations of the integers 
1,2, • • • , A: in any manner so that (1,2, • • • , A:) is the first. I^et (pn , • • • , Pik) 
be the ith permutation (i = 1, 2, • * • , A;!) and define Ut as the region < 
^Px 2 < * * * < • "Ihe collection is disjoint and covers all of W except 

the set H of points on hyperplanes Xi = Xj (i 5*^ j). The transformation T* : 
Xpn , • • • , Xp^j^ — > Xk maps Ui onto ui in such a way that Fk remains in- 
variant. 

Suppose now that w satisfies the conditions of the theorem. The removal 
of H Dw from w does not^ affect P(w | F) for any F eih • Hence 


P{W I F) = E P(«; n u< I F) = E f dF* 

t— 1 t—l 

k\ - 

= i: / dF, , 

*■"1 J U { 


where Cs(F) denotes the characteristic function of a set S, that is, Cb{E) = 1 
if F « (S, 0 otherwise. Next map each of the regions m, onto Ui by means of Ti . 
Fk is invariant, while ^»(F) such that E*-i for E eui . 

Then 


F(m) I F) = E f hiiE) dFk=^ f E hi(E) 

t—1 *'Ui Jui t — 1 



* Previously E denoted a random point (Xi, • • • , Xk), now it denotes an arbitrary point 
(xi, • • • , Xk) in the sample space W. This will cause no confusion. 

^Regions of structure S may be regarded as the result of applying R. A. Fisher’s 
randomization process [10] in the most general possible way to the problem of two samples. 
Special cases of regions with structure S have been considered by Feller [9] and Neyman 
[12[, and are implied by all writers [e.g., 6] who have attacked the problem of two samples 
by the method of ranks. 

* This may be seen by writing P(H | F) in the form of an integral over W of ch{E) dFk , 
where cuiE) is the characteristic function of the set H, and applying the Fubini theorem [4]. 



230 


HENBY SCHEFf£ 


But 

1 = P(W\F) = Z f dFu, 
and by use of Ti we find 

f dFk= f dFk (i = 1, • • . , k\). 

Jvi Jui 

Hence 

/ dF* = 1 /A:!, 

Ui 

and 

P{w I F) = M/k\ 

for all F € % . Thus w has the property ir2 . 

H is an example of a set in the class of regions w for which P(w | F) = 0 
for aU F €^2 • Since if regions Wi and W2 differ by a set u? € Ar2 , P(wi | F) = 
P(w2 1 F) for all F « Q2 , we have 

Corollary 1 : It is sufficient that w have the property 7^2 if it differs from a region 
with structure S by a region in N2 • 

Defining similarly the class as that class of regions w for which P(w ] F) = 0 
for all F € ^3 , we see that N3 is precisely the class of null sets. 

Corollary 2 : A sufficient condition thai w have the property tz is that it have 
the structure S except for a mdl set. 

The mildest restriction under which the writer has been able to concoct a 
necessity proof is that the boundary of be a null set. This class of regions w 
includes (to the best of his knowledge) all critical regions heretofore used in 
practice. 

Theorem 3 : For a w whose boundary is a null setj a necessary condition that 
w have the property is that it have the structure S except on a null set. 

Suppose then that w has the property n , and its boundary JS is a null set. 
Let Bi be the transform of B under Ti . Let the null set H' be the union of H 
with all Bi and let Wi w — W2 = {W — ii?) — H'. Then wi and W2 are open 
sets and P(wi | F) = P(w [ F) for all F € Q4 . Furthermore for any E either 
all or none of the points of {F'j are in U it;2 . Now consider any Eo € Wi 
and let Mo be the number of points of {Eo\ in , so that k\ — Mo of [Eq] are 
in W2 . Let Fo = (f i , • • • , f*), and 2 di = min l.f* — {y | for i 5^ j. Since Wi 
and W2 are open, cubes with sides parallel to the coordinate hyperplanes (a:y = 
constant) and edges of length 2^2 may be centered on the points Eo so that each 
cube is entirely in Wi or entirely in W2 , by choosing 62 sufficiently small. Choose 
d so that 5 > 0, 3 < 61 , 5 < 52 . set {FJ} is a subset of the set {Fo } of 
k^ points whose coordinates are in the set f 1 , • • • , {* allowing repetitions. For 
each point Eq = , ••• , f,*) in {Fo} construct a cube Ci^,...,i^ as above 



NON-PARAMETRIC TESTS 


231 


with center at E'i,' and edge 25. These cubes are disjoint. Let /<(x) be a p.d.f. 
such that the corresponding c.d.f. is in £li and /<(x) = 0 for | x — | > 5 (t = 1, 

- • • , k). Define the p.d.f. 

f‘\x) = i^fiix) (s = 1, • • • , fc). 

t-1 

Then the corresponding c.d.f. is in $ 24 . We have 
a = P(w\ F<*') = f II /‘’(X,) dW 

Jw y—i 

= E /.-.(a:,) ■••/utxt)dW, 

w t'l.* 

where dW = dxi • • • dxk . Bring the last summation sign outside the integral 
sign, and note that = 0 outside . Then 

(3) YL Ai. 

-.u-i 

where 

(4) Iu.-.u=f /.-.(xi) •••A(x*)dW. 

Our argument depends on certain sums of having the property that 

the sum is equal to a times the number of terms in the sum. In order to save 
space we shall say that if S is such a sum, then S € 72, /2 being the class of such 
sums. Clearly all sums (3) arc in R. Let [Sr,] be the subsets of r (r = 1, • • • , 
k) different integers in the set 1, 2, • • • , fc (j^ = 1, • • • , *Cr), and let be the 
sum of all /tj,...,** for which the index ii , • • • , ik consists only of integers in 
Sr,, and such that all the integers of Srp appear in the index. We wish to prove 
that Sjki , the sum of I for cubes centered on the points of {7?o} , is in R, To ac- 
complish this we make an induction on r: If we assume all ^rp c R for r < s, then 
we can show all 2,^ e 72 (5 = 2, • • • , fc). No generality is lost in taking as 
the set of integers 1, 2, • • • , 5 . Now consider the left member of (3). Some 
thought will show^ that it may be broken down into 2,^ plus a sum of ^rp where 
r < 8. But the left member of (3) is in 72, and by hypothesis so are all 2rp with 
r < 8. It follows that 2,^, is also in 72. To see that 21 ,, 6 72 (i' = 1, * • • , fc), let 


* To illustrate the reasoning, suppose s « 4. If S^r is the set of (different) integers o, 
6, • • • yh, denote by <a, 6, • • • , A>, that is, <a, 6, • ■ • , is the sum of all 7 whose 
indices contain a, 6, • • • yh and no other integers. Then the right member of (3) contains 
terms from <1, 2, 3, 4>; <1, 2, 3>, <1, 2, 4>, <1, 3, 4>, <2, 3, 4>; <1, 2>, <1, 3>, 

<1, 4>, <2, 3>, <2, 4>, <3, 4>; <1>, <2>, <3>, <4>. Every term of the right 
member of (3) is in one of these sums < >. No term can appear in 2 sums < >. Every 
term of each sum < > appears in the right member of (3). Thus the right member is the 
sunt of all sums < > listed above, and by hypothesis, all but the first sum < > are in R. 
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Su be V and note that Si, consists only of . Putting s = 1 in (3) we 

Jiave = a, and likewise Si, = = a. Thus Si, e R. 

We have at this stage that S^i = k\a. But as we already noted, of the cubes 
'V associated with the integrals I in the sum Sati , Mo are entirely inside Wi and 
k\ — Mo entirely outside Wi . For the set of Mo terms in S^i corresponding to 
the cubes C in Wi the region of integration it? PI C in (4) is actually C, and for the 
remaining set of terms in S^ri the region of integration is the empty set. Further- 
more if tt; n (7 = (7 in (4), the corresponding / is unity. Hence Siti = Mo = kla, 
<x = Mo/kl If we now repeated the process with any other point Ei e Wi in- 
stead of Eq , and let Mi be the number of points of {E[] in Wi , we would get 
a = Mi/fc!. Therefore Mi = Mo . FromO < a < 1, we conclude 0 < Mo < kL 
Thus Wi has the structure S. 

The exceptional null set allowed for in the statement of Theorem 3 entered 
the proof when we removed wC) H' from w. Had we assumed that the boundary 
B €N 2 , then the exceptional set would be in A/'2 . As a corollary to the reasoning 
used in the proof we thus get 

CoROLiiARY 3: If the boundary of w is in N 2 , a necessary condition that w have 
ihe property ^4 is that w have the structure S except on a subset in N 2 . 

Finally, because of (2), any sufficient (necessary) condition for w to have the 
property in is sufficient (necessary) for w to have the property tj U j > i (J < i). 
Hence we may replace ir^ in Theorem 2 and Corollary 1 by ws or n , ts in Corol- 
lary 2 by 7r4 , iTi in Theorem 3 and Corollary 3 by tts or ir 2 . This yields 

Corollary 4: If the boundary of w is a null set, a necessary and sufficient condu 
lion that w have the property irz {or in) is that it have the structure S except on a 
null set. 

Corollary 6: If the boundary of w is a region in N 2 , d necessary and sufficient 
condition that w have the property 1^2 {or irz or ^4) is that it have the structure S except 
on a subset in N 2 . 

3. Remarks. Wald and Wolfowitz [G, 8] in their work on the problem of two 
samples for the case F e ik have imposed the following restriction on any statistic 
used to test the null hypothesis: The statistic must be a function of V only, 
where the sequence V of k elements is formed as follows: Rank the Xj of the 
sample in ascending order of magnitude (ignoring cases where two Xj arc equal), 
and if the i-th element in this rank order is a F put the f-th element of V equal 
to zero, else unity. This means that the resulting critical region always consists 
of the union of s of the regions Ui defined in section 2, where 5 is a multiple of 
mini The results of our section 2 show that this restriction is not necessary if 
all we require is that Pr{E ewj, where w is the critical region and E the sample 
point, be the same constant a whenever the null hypothesis is true. In fact a 
valid (but probably not very efficient) solution of the problem of two samples 
has been proposed by Pitman [3] in which the statistic is not a function of V only. 

Putting further requirements on the critical region will lead to a more restricted 
class than the class of regions having essentially the structure S. For instance, 
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from section 2 it follows that the significance level a can be any of the values 
i/k\ (t = 1, • • • , fc! — 1). But if we lay down a symmetry condition to the 
effect that if (t/i , • • • , 2/m , , • • • , 2 n) is in w, all points obtainable by permuting 

the 2 /’s among themselves and the z^s among themselves be in it;, then a must be 
a multiple of m\n\/k\. Again, if we impose the condition that any statistic 
T{Xi , • • • , Xk) used to test the null hypothesis remain invariant when all the 
Xj are subjected to the same topological transformation of the real line onto it- 
self, then Wald and Wolfowitz [6] have shown that T must lx? a function of V 
only, so that w has the special structure described above. It would seem de- 
sirable when the subject of statistical inference in the non-parametric case may 
be entering a stage of rapid development, to be clear about the assumptions 
necessary to restrict the critical region to a particular class. 

In concluding these remarks, we quote with the kind permission of Dr. Wolfo- 
witz, from some correspondence with the writer. Important work has been done 
on non-parametric tests under the restriction that the statistic used be invariant 
under topological transformation. The following statement as to why this re- 
striction might be imposed will therefore interest the reader: “ • • • there are 
arguments pro and con • • • Pro: If the statistic be not invariant, this could 
happen: Two scientists working on the same problem and having the same 
observations to interpret might come to opposite conclusions if one used one 
scale of measurement and the other used a monotone function of that scale. 
Can: The criterion of topologic invariance of the statistic is a restriction on our 
freedom. Furthermore it cannot be imposed except in the univariate case 
([8], p. 270).^^ 
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FURTHER RESULTS ON PROBABILITIES OF A FINITE NUMBER 

OF EVENTS 


By Kai Lai Chung 
Tsing Hua University^ Kunming^ China 

In a recent paper^ the author has generalized some inequalities of Frfehet to 
the following: 

Let w ^ a ^ m ^ 1, and let 

(" I ”)" '■“«-» = 

AF(a) = F(o) - F(o + 1), A*F(a) = A(A*“V(a)); 

then 

^ 0 , ^ 0 . 

Using a generalized Poincare’s formula, P. L. Hsu has improved these inequali- 
ties to the recurrence formula stated below. 

Hsu’s formula is 

(1) AAi“> = — ^ 4 

n — m 

Proof: We have 

I^or a fixed “a” summing over all (a) « {»), 

- C J "- ;;; “)](6 - 

-(»■;) .S. «*’■■ G - ; - i)G - ir 

n m 


' the probability of the occurrence of at least m events among n arbitrary events,” 
Annals of Math. Stat., Vol. 12 (1941), pp. 328-338. We use throughout the same notation 
used in this paper, and that referred to in footnote 3. 
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Applying the formula repeatedly, we obtain for 0 ^ ft g n — a, 

')(” 1 ”)" AiS". 

Since every A ^ 0, we have, for 0 ^ A n — a, 

^ 0 , 

which includes my former results. 

Further, we may write (1) as 

(2) (n - a)Pl”^ = (a + 1 - m)PlTl + 
or 

(a + - (n- a)Pi"‘^ = m = mP'^l 

It follows that 

(3) (a + l)Pi?l - (n- a)Pi”'^ ^ 0. 

From (2) it also follows that 

(4) (n - a)Pi’"' - (a + 1 - m)Pi?l ^ 0, 

which is the same as AAi”' ^ 0. Combining (3) and (4) we obtain 


^ ® D(m) ^ ^ 'fl Cl i>(m) 

i 7 ^ X a+l ^ i Z X a • 

a+1 a+1 — m 

If we take the special case in = 1 and instead of the original events i^i , •••,!?» 
consider their negations, \ve easily obtain 

^ (:) - ^ - *•<«>}• 

This is equivalent to a result given by Fr6chet^ 

There is an analogue of Hsu’s formula for P[m] , as follows; 

Let n ^ a ^ m ^ 1, and let 


( n — mX-i 
a — m/ 


iplni] ^ 




It follows that for 0 ^ li ^ n — a, 




r'P'Tr* ; 


A»£[«i ^ 0. 


* ^'Ev^nements compatibles et probabilit^s fictives,” C. R. Acad, Sc,, Vol. 208 (1939). 
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The other results on pm in the paper^ also have analogues for p[mi . For tluj 
result on conditions of existence see the author’s recent paper*. Here we shall 
state the following extension of Boole’s inequality. 

For 2Z + 1 ^ n — a and 21 ^ n — a respectively, we have 

E (-!)• g g E (-!)’■ 

,-o \ m / \ m / 

Proof: We have 

-S«+.((.')) = E 7 


Hence, 


t ') - s {§ (”‘™ 0 (” 1 

= PW(W) + E i-iy ' ^)pfn.+M((«')). 

The inequalities follow immediately. 

Finally, we record two formulas which express PaiM) in terms of 
and in terms of ((>')) for a fixed m and ranging 6’s. Formulas which express 
PiaiCCi')) in both ways have been given*. 

We have, 


(r-\) 


P((7)) = E (-1)^“” E p™({/3)) 

b-m m « (7) 


Hence 


(l_0'S«(W)= E E (-ir- E P»W)) 

\ni 1/ (t) • (*') b— (3) « (7) 

- t S P.((«) 

\C — 0/ (^) , (,) 

By a generalized Poincare’s formula, we get 

p.m = t (- D- .X, (- .)- (: : ;) {: zt){L- \y 
- <- (n \y 

* “On fundaments,! systems of probabilities of a finite number of events,” Annals of 
Math. Stat., Vol. 14 (1943), pp. 123-134. 
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which would depend for its answer on an extensive power function analysis. 
We shall not go into this analysis, however, but shall use C on intuitive grounds. 
This case will be referred to as the case of four intervals. The extension of the 
method of the case of four intervals to any number of intervals presents no new 
difficulties in derivation, however we shall confine our attention to the cases of 
two and four intervals. 


2. The case of two intervals. Suppose f(x) is a continuous distribution func- 
tion with probabilit}" element /(x) dx. Let us draw a sample of size 2n + 1 from 
a population having this probability element. Let the elements in the sample 
be Ti , 3^2 , • • * , X2n fi ordered from least to greatest. The median of this sample 
will be Xn+i . Now consider a second sample of size 27n, and let mi be the num- 
ber of observations, whose values are less than Xn-M . We call m2 = 2m — mi 
the number of elements in the second sample greater than Xn+i . 

/ *n+l 

f{x) dx be the probability of an observation having a value less 

00 

than Xn+i . Then the probability of an clement having a value greater than 
x„-|.i is (1 p). Thus we have the relation /(xn 41) dxn+i = dp. The probability 

law of the median, .rn-f-i given by the multinomial law^ is 

( 2 ) 


The conditional probability law of mi , given , is then 

(2m) I 


(3) 


Pr(mi I Xn+l) = 


- P) 


mi!(2»i — mi)! 

From this it follows that the joint probability law of Xn+i and mi is the product 
of (2) and (3) or 


(4) 


Prim , x„+i) 


(2n + l)!(2m) 

* (1'r\ 

n!nlmi!(2m — mi) ! 


We may integrate (4) with respect to p from 0 to 1 as a l^eta Function, leaving 
the distribution function of mi independent of the population prol)ability ele- 
ment f{x) dx. We get for the distribution of mi , 


1 The multinomial law may be stated briefly as follows: 

If a trial results in one and only one of the mutually exclusive events Ei 
the probability P that in a total of n trials, tii will result in Ei , rii in Ei , 



is given by 


P 


n! 


ni! rtf! ••• n*! 






Ei , • ' • j Ek , 

• • , n* in Ek , 


where Pi , P2 , • • • , p* , 



El , fEk respectively. 


, are the probabilities of a single trial resulting in 
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P ( ^ — ( 2 ^ + l)!(2m)!(n + m\)\{n + 2m — mi)! 

n!n!mi!(2m — mi)!(2n + 1 + 2m)! 


From (5) a simple recursion relation between Pr{m\) and Pr(mi + 1) may be 
determined from which the probabilities of various values of m may be rapidly 
computed. For large samples it can be shown that under certain regularity 
conditions, the ratio, [mi — Eimi)]/<Tnt^ maybe approximated by the normal distri- 
bution^ with zero mean and unit variance. The derivation is similar to that of 
the four-interval case, which is taken up in greater detail. It will be found by 
the use of (4) that the expected value of mi is m, and the variance of mi is m + 

— Using this information, values of mi for various 
2n + o 


TABLE I 

The Case of Two Intervals 

Lower and upper ,01 and ,05 percentage points for the distribution of mi 


Sample sizes 

Critical values of mi 

First 

Second 

Lower 

Upper 

2n+ 1 

2m 





11 

10 


1 

9 


41 

40 

10 

12 

28 

30 

101 

100 

34 

38 

62 

66 

. 101 

200 

72 

80 

120 

128 

201 

200 

77 

84 

116 

123 

201 

400 

160 

181 

219 

240 

401 

400 

167 

177 

223 

233 

401 

800 

353 

367 

433 

447 

1001 

1000 

448 

463 

537 

552 


significance levels may be computed. The .01 and .05 percentage points of mi 
for several sample sizes are given in Table I. The values for sample sizes of 10 
and 40 are computed directly from the probability law, while the larger samples 
have limits computed by the normal approximation. Thus for two samples of 
size 101 and 100, respectively, a value of mi less than 38 would be significant 
at the .05 level. Similarly, at the upper .05 level, the hypothesis would be 
rejected if a value of mi were obtained which was greater than 62. The necessity 
for the upper limits could easily be eliminated by testing with respect to the 
smaller of mi and m 2 . However, for completeness, the upper percentage points 


* This statement may be proved by showing that as m, n — > 00 such that m/n — constant, 
the limit of the moment generating function for the ratio is identical with the moment 
generating function of the normal distribution with zero mean and unit variance. 
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are included to show the range of values of rwi in which the hypothesis that the 
two samples come from the same population may be accepted. 


3. The case of four intervals. If we let the first sample of size 4n + 3 be 
designated by (xi , .r2 , • • • X4n+3), assumed drawn from a population with prob- 
ability element /(x) dx and ordered from least to greatest, then the range of x 
may be divided into four intervals by Xn+i , X2n+2 , and Xzn+z . The probability 
element of Xn+i , X2n^2 , xsn+a is 




nlnln 


*y*(Xn+l) dXn^\f(X2n+2^ dX2n^2f{^Zn’\-z) dXzn~\-Z • 


TABLE II 

The Case of Four Intervals 
.95 and .99 percentage points for the distribution of C 


Sample sizes 

C.96 

C.99 

First 

Second 

4n + 3 

4m 

n 

m 

15 

12 

3 

3 

.446 

.582 

G3 

60 

15 

15 

.113 

.161 

103 

100 

25 

25 

.072 

.102 


Let 


Z ^n+l r»2n+2 /•J’8» + 8 T* 

f{x)dx = pi, / f(,x)dx=p2, / f(x)dx = p3, / fix)dx = pi. 

w •'aCfi+1 ''»2n+2 •'*8n + 8 

The probability element of Pi , P2 , Pz y and pi is 


( 6 ) 


Prfecn+l)) 


(4n + 3)! 
n!l!n!l!n!l!n! 


Pi P2 Pz Pa dpi dp2 dpz . 


Now let us consider the second sample, (x [ , xi , • • • x^m), of size 4?w.. Let the 
number of observations falling in each of the preassigned intervals be r/i» , {i = 
1, 2, 3, 4), where m4 = 4m — mi — m2 — m3 . The conditional probability of 
the m, , given the values of x,(n+i) is also determined by the multinomial law. 


(7) Prim I ■ , pTp^pTpT. 

7Wi!m2!m3!m4! 

The joint distribution of the pi and the m^ is then 

(8) . m.) . dp. dp. dp, . 
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To obtain the distribution of the alone, the p* will be integrated out by the 
Dirichlet Integral® formula, giving a distribution which is clearly independent 
of the population distribution function /(x). 

/Qx p / X ^ (4n + 3)1 (4m)! (n + mQ! (n + m2)! (n + mz)\{n + mQ ! 

(n!)%l!w^2!w^3!m4!(4m + 4n + 3) ! 

To find the expected value of the rrii , the probability law of mi will first be 
derived. The probability function for the value of Xn^i is 


(10) 




Then we have the conditional probability 
(11) Pr(mi\Xn+l) = 

and 


(4wi) ! mi / 1 \4m— mi 


( 12 ) PriXn+l , mi) = — 


mi!(4m — mi) 

(4n + 3)! (4m)! 


i pi"^‘(l - dpi . 


n! (3n + 2) Imi! (4 to — Wi) ! 

To obtain the expected value of wu , the joint distribution of nu and pi is 
multiplied by mi , summed on mi from 0 to 4m , and integrated on pi from 0 to 1. 

EV^, ^ + 3)1 ^ ^3n+2 


(13) 


““ 4m 

• z 

^ 0 


mi 


(4 m)! 


mi! (4m — mO 1 


-T^,pr'(i-p,) 




This interchange of the order of integration and summation is clearly valid. 
The quantity in brackets will bo recognized as the first moment of the binomial 
distribution, (pi + where = 1 — pi . Therefore we have 


(14) 


E{mi) = [ 4mpi/(p,)dpi = 4mij;(pi). 

Jo 


E(pi) and the higher moments of pi are found in the usual way by integrating 
the distributions as Beta Functions. From this we see that the expected value 
of mi is m. By repeating these operations on m2 , m3 , and nii , it can be seen 
that E(mi) = m, which also validates the statement made in the introduction. 


® A discussion of the Dirichlet Integral may be found in Woods — Advanced Calculue, p. 
167. It may be stated as follows for the problem in which we are interested 



l ytn — 


X 


y — dz dy dz = 


ra) r(yn)r(n)r(r) 
r(i + m + n + r) ' 


where we integrate over the region bounded byx-h2/ + **l| and the three coordinate 
planes. 
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We have previously presented the criterion (1). 

The next problem is to find a distribution function to which the distribution 
of C may be fitted. A reasonable choice appears to bc^ the Pearson Type I curve. 


(15) 


r(r)r(s) ^ ^ • 


The distribution of C is fitted by equating the first two moments of the two dis- 
tributions and solving for the constants r and s of the Type I distribution. Csing 
the theorem that the mean value of the sum of variates is ecjual to the sum of 
their mean values, we have 


(IG) 


E{C) = 


1 


[E(ml) + Eiml) + E{ml) + E(ml) - Am-]. 


Also the second moment may be written as 


E(e) 


81 m" 


[E(m\) + E(ml) + E(mt) + E(m\) + lOm" + 2Eim\ml) 


(17) + 2Eimlml) + 2E{m\m\) + 2E{mlml) + 2Eiml ml) 

+ 2E(mlml) - 8m'{£:(m?) + Eiml) + Eiml) + Eim])]]. 


The expected value of ml is found in the same manner as Eim\) and here also it 
can be shown that the Eim\) are all equal. The same procedure holds for 
Eim\). 


Eiml) = m + 


miim — l)(n + 2) 
in + 5“ 


(18) 


Eim*,) = m “I" 


7m(4m — l)(n + 2) , 6m(4m — l)(4m — 2)(n + 3)(n + 2) 
4^ 5 (4n + G)(4n + 5) 


. m(4m — l)(4m — 2)(4m — 3)(n + 4)(n + 3)(n + 2) 
(4n'+ 7)(4nTliK4n'+T) 


lly using the moment generating function of the trinomial distribution, the 
Eimlml) may also be found in a similar manner. 


(19) 


E,/ 2 2 n m(4m — l)(n + 1) , 2m(4m — l)(4m — 2)(n + l)(n + 2) 
jj-p-j + (jr+OKS +'5) 

m(4m — l)(4m — 2)(4m — 3)(n + 2)(n + l)(n + 2) 


+ 


(4n + 7)(4n + 6)(4n + 5) 


As a result we have 


EiC) 


4 , 4(4m — l)(n + 2) 
9m 9m(4n + 5) 


( 20 ) 
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Let E{C) = A to simplify later relations to be computed. Finally 

E(r^) - ^ fj , 7(4m- l)(n + 2) , 6(4m- l)(4m-2)(n + 3)(w + 2) 
^ ^ Slm’L 4n + 5 (4n + 6)(4n '+ 5) 

(4m - l)(4m - 2)(4m - 3)(« + 4)(n + 3)(n + 2) , 

(4n + 7)(4n + 6) (4n + 5) 

n 4- 3(4m — l)(w + 1) , 6(4m — l)(4m — 2 )(ra + l)(n + 2) 

4n 4- 5 (4ra + 6) (in 4- 5) 

, 3(4m — l)(4m — 2)(4m — 3)(n 4* 2)*(n 4- 1) q _2 

(4n 4- 7)(4n 4- 6)(4n 4- 5) 

_ 8m®(4m — l)(n 4- 2)1 
4n4- 5 J‘ 


To simplify later relations we let = B. 

The fimt two moments of the Type T distribution are easily found to be 


( 22 ) 


#11 = — = A 
r + s 


_ #ii(r 4- 1) _ p 

#*2 = i j— Pi = B. 

(r 4- s 4- 1) 


Solving these two simultaneous equations for r and s, 

B- A 

(23) " 


A-^ 
^ I 


r 


r. 


A number of percentage points for the Type I distribution have been computed 
by Miss Catherine Thompson, [3], losing these limits, the hypothesis may be 
accepted or rejected as to whether or not the two samples come from the same 
population. 

Table II shows the .95 and .99 percentage points of C for three sample sizes. 


4. Summary. The problem considered here is that of devising a simple 
method of testing the hypothesis that two samples are from identical populations 
having continuous distribution functions. It may be summarized briefly as 
follows. The first sample is used to establish any desired number of intervals 
into which the observations of the second sample may fall. A test criterion is 
proposed which is based on the deviations of the numbers of elements of the 
second sample which fall in the intervals from the expected values of the respec- 
tive numbers. Two cases are discusvsed, that of two intervals and that of four 
intervals, making use of the median and quartiles in the first sample to deter- 
mine the intervals. Tables of 1% and 5% points for several sample sizes of 
both cases are given. 
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NOTES 

This section is devoted to brief research and expository articles, notes on method- 
ology and other short items, 

NOTE ON THE INDEPENDENCE OF CERTAIN QUADRATIC FORMS 

By Allen T. Crmg 
University of Iowa 

Various approaches to the problem of the independence of quadratic forms 
in normally and independently distributed variables have been made by R. A. 
Fisher, Cochran, Madow and others. It is the purpose of this note to point 
out a few simple propositions which, in so far as the writer is aware, have not 
had specific mention in the literature. 

1, Independence of certain quadratic forms. Theorem 1: A necessary and 
sufficient condition that two real symmetric quadratic forms, in n normally and 
independently distributed variables, be independent in the probability sense is that 
the product of the matrices of the forms be zero. 

Let the chance variable x be normally distributed with mean zero and unit 
\’ariance. Let ,vi , X 2 , • • • , x,, be n independent values of .r and let A and B 
be two real symmetric matric('s, each of order 71 . Write Qi = and 

Qi = XlbijX.Xj where || a,j 1| = A and |1 |1 = B, It is well known that the 

generating function of the moments of the joint distribution of Qi and Qi can be 


written 

G{\, X') = |/ - XA - X'BT^, 

so that 


(1) 

1 7 - XA - X'B 1 = 1 7 - XA i| 7 - X'B 


for all real values of X and X', is necessary and sufficient for the independence of 
Qi and Q 2 . 

If Q\ and Q 2 are independent, then (1), being true for all real values of X and 
X', is in particular true for X = X'. Thus 

(2) I J - X(A + B) 1 = I / - XA 11 / - XB 1 . 

Denote by n , r^ and r < ri + r 2 respectively the ranks of A, B and A + B. 
Then r = n + r 2 since (2) expresses the identity of two polynomials in X of 
degrees r and n + r 2 . 

Further, if we write 

I 7 - XA 1 = (1 - Xpi) • • • (1 - XprJ, 

II-XBl = (1 -X«i) ... (1 -Xgr,), 
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and 1 1 — \(A + B) \ = (I — \si) •••(! — Xsr,-f then, because the factoriza- 
tion of polynomials is unique, each Sj can be paired with one of the numbers 
Pi > * • * » Pri , , • • • , (Zrj . Thus, if Qi and Qo are independent, the rank of 

A + is the sum of the ranks of A and B, and the non-zero roots of the char- 
acteristic equation oi A + B are those of the characteristic e(]uation of A 
together with those of the characteristic equation of B, There exists an appro- 
priately chosen orthogonal matrix L of order n such that U{A + fl)L, TJ being 
the conjugate of L, is a matrix with the reciprocals of the numbers Pi , • • • , Pn , 

j * * • ? (/ra on the principal diagonal and zeros elsewhere. Then UAL and 
UBL have no overlapping non-zero elements and UALUBL = 0. But L' = 
the inverse of L, Hence, upon multiplying both members of the preceding 
ecpiation on the right by L' and on the left by L, we have AB = 0. Since 
A = A' and B = B\ likewise BA = 0. 

Conversely, suppose AB = 0. Then the matrix (/ — XA)(/ — X'B) = 
7 — XA — X'B. These matrices being equal, their determinants arc equal and 
the condition (1) for the independence of Qi and Q 2 is satisfied. 

The theorem is readily extended to the case of the mutual independence of 
any finite number of such cjuadratic forms. 

The product of a non-singular matrix and a matrix of rank B is a matrix of 
rank R, Hence, every non-singular quadratic form of the kind here discussed 
is correlated with every non-identically vanishing quadratic form in the same 
variables. 

2. Conditions for independent (^hi-Square distributions. The preceding 
theorem enables one to determine, by multiplication of matrices, whether real 
symmetric quadratic forms in normally and independently distributed variables 
are themselves independent in the probability sense. The following theorem 
affords a simple t^est as to whether the distributions are of the Chi-Square type. 

Theorem 2: Necessary and sufficient conditions that each of two real symmetric 
quadratic forms, in n normally and independently distributed variables with mean 
zero and unit variance, be independently distributed as is Chi-Square, are that 
the product of the matrices of the forms be zero and that each matrix equal its own 
square. 

If Qi and Q 2 are independently distributed as is Chi-Square, then AB = 0 
and each of the non-zero roots of the characteristic equations of A and B is +1. 
For an appropriately chosen orthogonal matrix L, of order n, L'AL is a matrix 
with n elements on the principal diagonal +1, all other elements being zero. 
For such a matrix it is seen that (L'AL) (L'AL) — UA^L = UAL and A^ = A. 
A similar argument shows that B^ = B. 

Conversely, if AB = 0, then Qi and Q 2 are independent. Further, if A* = A 
and B^ = B, each of the non-zero roots of the characteristic equations of A and 
B is +1. This follows from the fact that the roots of the characteristic equa- 
tion of the s(|uare of any matrix are themselves the squares of the roots of the 
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characteristic equation of that matrix. Since A and B are real and symmetrica, 
the roots under consideration an^ real. Thus Qi and Q 2 have independent 
Chi-Scpiaie distributions with n and r2 degree's of fri'CHlom respectively. 

This theorem can likenvise be extended to any finite number of these quadratic 
forms. 

Of special interest is the case of, say fc, (piadratic forms for which the sum of 
the k matrices is the identity matrix. Thus Ai + A 2 + • — + Ak — L By 
Theorem 1 , it is both necessary and sufficient for thc^ mutual independence of the 
k forms that A^A^ = 0, u 9 ^ v. 

Now 


and 

Ai = I- .1, - 

• • • — — .4,41 - 

A lA j 

1 

1 

11 

• • — Ai^iAj — .4.+1. 


Aj- • • - Ak 
• - A] AkAj , 


so that Aj = A^j. In this particular case it is to be seen that the mutual inde- 
pendence of the forms implies that their several distributions are of the Chi- 
Square type. 


A CHARACTERIZATION OF THE NORMAL DISTRIBUTION 

By Irving Kaplansky 
Harvard University 

In 1925 R. A. Fisher gave a geometric derivation of the joint distribution of 
mean and variance in samples from a normal population {Metron, Vol. 5, pp. 
90-104). On examining the argument however, we find that an (apparently) 
more general result is actually established: if f{xi) • • • /(in) is a function g{m^ s) 
of the sample mean m and standard deviation then the probability density of 
m and s in samples of n from the population f{x) is g{m, This condition 

onf{x) is of course satisfied if f{x) is normal; in this note we shall conversely show 
that for n ^ 3 it characterizes the normal distribution. In the proof it will be 
assumed that g{m, s) possesses partial derivatives of the first order, although a 
weaker assumption would probably suffice. 

Let us for the moment restrict the variables Xi to values such that f{Xi) > 0. 
After a change of notation we have 

+ • • • + <t>(xn) == h{u^ v), 

where <A = log/, w = i:i + • • • + Xn , p = i(^i + • • • + ^n). A differentiation 
yields 


<f> {xi) — hu "b • 
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Solving two of these equations for A, , we find 


( 1 ) 


h. 


Xi — Xj 


(» 


and, for n ^ 3, it follows that the right member of (1) is a constant, say 2A. 
Then 


— 2Axi = — 2Axj — a constant B. 

<f>(x) — As^ + Bx “1“ C. 

We now havp/(x) = whenever /(. t) > 0; but since /(x) is continuous, this 
implies /(x) = c*'** everywhere. 
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Cudmore, Sedley A. M.A. (Oxford) Stat., Dominion Bur. of Stat., Ottawa, Canada. 

Cureton, Edward E. Ph.D. (Columbia) Sr. Personnel Technician, War Dept., RFD 1, 
Tauxemonty Alexandria, Va. 

De Castro, Prof. Lauro S. V. Civ. Eng. (Escola Nacional de Enginharia) Catholic Univ., 
Rio de Janeiro, Brazil. 62 rua David Campista. 

Edwards, G.D. A.B. (Harvard) Dir.of Quality Assurance, Bell Telephone Laboratories, 
463 West St., New York, N. Y. 

Gifford, Kenneth R. Student, Mass. Inst. Tech., Cambridge, Mass. 97 Bay State Rd., 
Boston, Mass. 
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Gottfried, Bert A. A.M. (Columbia) Stat. Clerk, 4300 Kaywood Dr., Mt. Ranier, Md. 

Hamilton, Prof. Thomas R. Ph.D. (Columbia) Texas A. & M. Coll., College Station, Tex. 

Heide, J. D. M.S. (Iowa) Stat., U. S. Rubber Co., 1324 Altoona Ave.y Eau Claire^ Wise, 

Hilfer, Irma. M.A. (Columbia) Actuary, N. Y. C. Board of Transportation, 165 W. 97 St,, 
New York, N. Y. 

Howell, John M. B.A. (UCLA) Stat., Northrop Aircraft Inc., Hawthorne, Calif. 4H0 
W. 63 St., Los Angeles, Calif. 

Hurwicz, Leonid. L.L.M. (Warsaw) Res. Asso., Cowles Comm., Univ. of Chicago, Chi- 
cago, 111. 

Kendall, Maurice G. M.A. (Cambridge) Stat., Chamber of Shipping of the United King- 
dom, Richmond House, Aldenham Rd., Bushey, Eng. 

Klein, Lawrence R. B.A. (California) Teaching Fellow^ Mass. Inst. Tech., Cambridge, 
Mass. 

Kuznets, George M. Ph.D. (California) Instr., Giannini Foundation, Univ. of Calif., 
Berkeley, Calif. 

Landau, H. G. M.S. (Carnegie Inst. Tech.) Stat. Analyst, War Dept., Washington, D. C. 
2408 20 St., N.E. 

Langmuir, Charles R. Ed.M. (Harvard) Carnegie Foundation. 4S7 West 59 St., New 
York, N. Y. 

Levy, Henry C. L.L.B. (Fordham) Instr., N. Y. C. C., New York, N. Y. 600 West 116 St. 

Li, Jerome C. R. B.S. (Nanking) Student, Iowa State Coll., Ames, Iowa. 2184 Lincoln 
Way. 

Lieberman, Jacob E. B.S. (Brooklyn Coll.) Jr. Stat., Census Bureau, Washington, D. C. 
2422 14 St., N. E. 

Martin, Margaret ?• M.A. (Minnesota) Instr., Columbia Univ., New York, N. Y. 1230 
Amsterdam Ave. 

Nash, Stanley W. B.A. (Coll, of Puget Sound) San Joaquin Experimental Range, 0*Neals, 
Calif. 

Norton, Horace W. Ph.D. (London) Sr. Meteorologist, U. S. Weather Bur., Washington, 
* D. C. 3118 North First Rd., Arlington, Vd. 

Olds, Edward B. Ph.D. (Pittsburgh) Stat., Curtiss -Wright Corp. 298 Niagra Falls 
Blvd., Buffalo, N. Y. 

Preston, Bernard. C.P.A., 103 Park Ave., New York, N. Y. 

Rosenblatt, David. B.S. (Coll. City of N. Y.) Asst. Stat., 1432 Whittier St., N. W., Wash- 
ington, D. C. 

Sard, Asst. Prof. Arthur. Ph.D. (Harvard) Queens College, Flushing, N. Y. 146-19 
Beech Ave. 

Schapiro, Anne. B.A. (Bryn Mawr) Jr. Analyst, Institute of Applied Econometrics, 
350 W. 57 St., New York, N. Y. 

Simpson, William B. Grad. Student, Columbia Univ., New York, N. Y. 

Springer, Melvin D. M.S. (Illinois) Asst. Instr., Univ. of Illinois, Urbana, 111. 

Stein, Irving. B.S. (Mass Inst. Tech.) Asso. Stat., War Dept., Washington, D. C. 611 
Oglethorpe St. 

Stergion, Andrew P. M.S. (Mass Inst. Tech.) Ist Lt., USA, The Proving Center, Aber- 
deen Proving Gd., Md. 

Sternhell, Arthur I. B.A. (New York) Staff Asst., Metropolitan Life Ins. Co., 1938 E. 
Tremont Ave., Parkchester, N. Y. 

Thompson, Louis T. E. Ph.D. (Clark) Dir. Res. and Dev., Lukas-Harold Corp., In- 
dianapolis, Ind. 340 East Maple Rd. 

Tyler, Asst. Prof. George W. M.A. (Duke) Virginia Polytechnic Inst., Blacksburg, Va. 

Working, Holbrook S. Ph.D. (Wisconsin) Chief Stat. Consultant, War Production 
Board, Washington, D. C. Food Res. Inst., Stanford Univ., Calif. 
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The following persons have been elected to Junior membership in the Institute : 
Blumenthal, Lydia. Hunter College, New York, N. Y. 1001 Lincoln PI., Brooklyn, N. Y, 
Gunlogson, Lee. Univ. of Minnesota, Minneapolis, Minn. 1906 Third Ave. 

Heacock, Richard R. Oregon State Coll., Corvallis, Ore. P. 0. Box 207, Seaside, Ore. 
Locatelli, Humbert J. Columbia Univ., New York, N. Y. 44 Seaman Ave. 

Mathisen, Harold C., Jr. Princeton Univ., Princeton, N. J. 4 Middle Dod Hall. 

Murphy, Ray Bradford. Princeton Univ., Princeton, N. J. 28 Godfrey Rd., Upper Mont- 
clair, N. J. 

Peters, Edward J., Jr. Georgetown Univ., Washington, D. C. 126 St. James PL, Atlantic 
City, N. J. 

Smith, Joan T, Univ. of Minnesota, St. Paul, Minn. 673 East Nebraska Ave. 



SPECIAL COURSES IN STATISTICAL QUALITY CONTROL 


The application of statistics to quality control is now being furthered in a 
program in which the War Production Board and the U. S. Office of Education 
are cooperating to assist statisticians in various industrial areas to provide 
suitable courses of instruction sponsored by their own institutions. 

The general plan of the program has been influenced by two conclusions 
drawn from the experience gained in ESMWT courses carried on by Stanford 
University during 1942-43.^ These conclusions were: (1) that a short full- 
time course in statistical quality control tends be peculiarly effective; and 
(2) that it is vital to have the initial courses followed by meetings in which the 
course members gather to report on applications they have made and to receive 
encouragement and any needed assistance. 

The giving of short full-time courses pi*esents a problem of assembling a suitable 
staff, since four instnictors will ordinarily be needed. If this problem were solved 
by arranging for a single staff to tour all the principal industrial regions giving 
courses in quality control, the local leadership necessary for establishing wide- 
spread use of statistical methods of quality control in industry would not Ix^ 
developed. The program adopted seems to offer an effective solution of these 
problems. 

Under the program now in effect, the War Production Board, through its 
Office of Production Research and Development supf)lies an experienced person 
to assist with the arrangement of courses and to participate in the insti-uction.- 
Two of the instructors in each course will ordinarily be provided by a local educa- 
tional institution, which will also promote the course and make necessary local 
arrangements through its institutional representative of the Engineering Science 
and Management War Training program. It is not considered necessary that 
the instructors provided by the institution have previous experience with statisti- 
cal quality control provided they are sufficiently competent in the theory of 
sampling, but it is desirable that at least one of them have pi’actical experience 
with quality control. It may often happen that one of the instructors can be a 
quality control man from a local industrial establishment. The representative 
of the WPB will assist with arrangements for bringing in one (or, where needed, 
two) additional outside instmetors. 

The sponsoring institution costs for the courses, which do not include the 
salary and expenses of the representative of the WPB, may be provided through 
the ESMWT program. The follow-up work with men who have taken the 
initial courses may be arranged also as part of the ESMWT program of the 

1 A description of these courses offered by Stanford University appeared in the Annals 
of Mathematical Statictics, March 1943, p. 96. 

* At present Professor Holbrook Working is serving in this capacity. 
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educational institution sponsoring the original courst\ The follow-up work 
should be handled by a local instructor who participated in the original course. 

The two basic courses and the one follow-up course that have already been 
given by Stanford University were conducted under essentially the plan out- 
lined above, except that they did not have the lienefit of assistance from the 
WPB. Three courses have thus far (May 25) been arranged under the new 
plan: one sponsored by Rhode Island State (^ollege, to Ix) held during May 27 
to June 2 at Newport, and two sponsored by Stanford University, to be held 
respectively in Los Angeles, June 13 to 20, and in San Francisco, June 22 to 29. 
Preliminary steps have been taken toward the arrangement of several additional 
courses. 



REPORT OF THE NEW YORK MEETING OF THE INSTITUTE 

A joint meeting between the Institute and the American Society of Mechanical 
Engineers was held on Saturday, May 29, 1943 at the Engineering Societies 
Building, 29 West 39th Street, New York City. Of the ninety-five individuals 
attending the meeting, the following fifty-seven members of the Institute were 
present: 

Theodore W. Anderson, K. J. Arnold, Robert E. Bechhofer, B. M. Bennett, C. I. Bliss, 
Mary E. Boozer, P. Boschan, A. H. Bowker, Burton H. Camp, A. C. Cohen Jr., H. F. Dodge, 

C. Eisenhart, Mary L. Elveback, W. C. Flaherty, H. Goode, John I. Griffin, Charles C. 
Grove, Frank E. Grubbs, E. J. Gumbel, Harold Hotelling, J. M. Juran, B. F. Kimball, 
Lila Knudsen, Howard Levene, E. Vernon Lewis, Simon Lopata, Frank W. Lynch, 
Henry Mann, E. C. Molina, N. Morrison, Philip J. McCarthy, Luis F. Nanni, 
Franklin S. Nelson, M. L. Norden, P. S. Olmstead, R. F. Passano, Edward Paulson, G. A. 

D. Preinreich, A. C. Rosander, Arthur Sard, Henry Scheff4, Bernice Scherl, Edward M. 
Schrock, L. W. Shaw, William B. Simpson, S. G. Small, Arthur Stein, Andrew P. Stergion, 
M. Stevens, David F. Votaw Jr., A. Wald, Helen M. Walker, W. A. Wallis, S. S. Wilks, J. 
Wolfowitz, L. C. Young. 

The general topic of the meeting was Industrial Applications of Statistics. At 
the morning session the following papers were presented, with Professor Harold 
Hotelling presiding: 

1. On the Theory of Runs with some Application to Quality Control. 

J. Wolfowitz. 

2. On the Presentation of Data as Evidence. 

Churchill Eisenhart. 

At the afternoon session, the following papers were presented with Mr. E. C. 
Molina as Chairman: 

1. A Sampling Inspection Plan for Continuous Production. 

H. F. Dodge. 

2. Tolerances and Product Acceptability. 

L. C. Young. 

A meeting of the Board of Directors was held after the afternoon session. 

Edwin G. Olds 

Secretary 


204 



PROBABILITIES 


237 


Similarly we have 


..(w) = t j:^„ 


[m] 

b 


It remains to be seen whether the series in the curl brackets can be summed. 

Using a formula in footnote 3, we may obtain the desired formula in another 
way. We have, in fact, 


?.((»<)) = S Pw ((»')) 


b^tn+n—e 
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The “complete” series 
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The “incomplete” series we denote by 

Then we may write 

Pa(W) = "e" ^ ~ IV Pi"’ + L (- l)'^X(n, a, b, m)Pi”\ 
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ON THE PROBLEM OF TESTING HYPOTHESES 


By R. V. Misbs 
Harvard University 

1. Introduction. The following is known as the problem of testing a simple 
statistical hypothesis. The probability distribution of a variate X depends on 
a parameter d. In the course of experiments each time a value a: of X is observed, 
one pronounces one of the two assertions: equals t^o” or “d is different from 
do The first assertion is made when the observed value x falls in a “region of 
acceptance” A, the second, if x falls in the complementary region A. What is 
the chance of these assertions being correct and how can A be chosen to make 
this chance as high as possible? 

The distribution for the variate X is considered as given. Let P(x | d) be 
the probability of the value of X being ^ x. It is obvious that to know P(x | d) 
is not sufficient for computing the success or error chances of the above assertions. 
There is another distribution function Po(d) involved which we may call the 
initial or the a priori or the over-all distribution of the parameter d. The 
meaning of Po(d) is as follows. In the infinite sequence of trials there will be 
among the first N experiences Ni cases where the assertion that the parameter 
value is ^ d proves correct. Then Po(d) is the limit of the ratio Ni/N when N 
tends to infinity. If No is the number of cases in which the actually pronounced 
assertions d = do or d do respectively, prove correct, the limit of No/N is the 
success chance and of 1 — No/N the error chance of the test under consideration. 
'It w'ould not make any sense to assume that an error chance exists but the over- 
all chance Po(d) does not.* 

The success and error chances for the assertions d = do and d do depend on 
both functions P(x | d) and Po(d). But in most practical cases nothing or very 
little is known about the parameter distribution. Usually, only the limits 
within which d varies are known, or a set of distinct values is given which d 
can assume. Therefore, the problem of testing a hypothesis must be modified 
in the following way. We ask: What can be said about the error and success chances 
of the two alternative assertions and about the choice of the region of acceptance, if 
Po(d) is entirely or partly unknown? This form of the question corresponds 
more or less to the conception generally adopted today. 

In section 4 of this paper a complete answer to the question is presented for 
the case of a parameter distribution that is entirely unknown except for the range 
of possible d-values. This solution, with the restriction to a parameter assuming 
distinct values only, was already given by Robert W. B. Jackson in a paper 
devoted mainly to some genctical problems [1]. The particular circumstances 
prevailing under the restriction to distinct parameter values will be discussed 

‘ The expression “chance" rather than “probability” is used here since no randomness 
is required. Cf. the autiior’s paper [2] p. 157. 
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in section 8. In section 6 the result is extended to composite hypotheses and 
in section 7 to problems in several dimensions. An important case of restrictions 
imposed to PoW is discussed in section 9. 

In the preceding lines the subject of testing a statistical hypothesis was pre- 
sented in its simplest form, with one scalar variate and one parameter, in order 
to discard all non-essential complications which would serve only to veil the 
principal point. For the same reason it is to be understood, in the following 
text, that region (in one dimension) will mean an interval or a finite number of 
intervals, and distribution will mean a set of concentrated values at distinct 
points with a continuous density in between or a continuous density throughout. 
If, for the sake of brevity, a Stieltjcs integral is used, nothing else is meant than 
the combination of a sum and an ordinary integral of a continuous function. 
With respect to the parameter the distributions P(x\ d) are considered as 
either defined for distinct t?-values only or as continuous functions, etc. 

2. Error chance. Success rate. J. Xeyman who must be credited with 
successfully promoting many problems of mathematical statistics introduced 
the distinction between errors of first and second type and made this the basis 
of his approach in dealing with the theory of tests. An error of first kind is 
committed if the assertion 5 ^ t^o is made when t? equals ; an error of second 
kind occurs when the assertion = t?o proves incorrect.^ The chances Pj and 
Pii of these two events can easily be computed, if the distributions P(x | d) and 
Po(t?) are considered as known. From P(x 1 1 ?) we derive the probability P{A 1 1 >) 
for X falling in the region A. In particular P(A 1 do) will be designated by 
1 — a. Thus a is the probability of x falling in A when d = do , The function 
Po{d) can have, at the point = t?o , a jump of magnitude tto . The set of all 
t?-values except do will be called //. Then the two error chances are obviously 

(1) Pi = aTTo Pit = f P{A 1 d) dPoW. 

By the integral over H is meant that the term P(A | t?o)7ro in the summation has 
to be omitted. The formulae (1) show anew that it would be senseless to speak 
of error chances without assuming that an over-all distribution Poid) exists. 

In all papers that follow Neyman’s line of thought first and second type 
error chances are discussed. But the formulae (1) are seldom written down.® 
It is incorrect to say that a is the chance of a first type error and it is likewise 
incorrect to say that the chance of a second type error depends on it depends 
on the distribution of d. 

The total error chance is 

(2) Pe^ Pi + Pii = _ PU j t^) dPo{d) 

^ w 

* See e.g. ref. [4], [5] or various other publications by the same author. 

» They are included e.g. in equation (1) of A. Wald’s paper [51. 
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and 1 — is the success chance. If the distribution P{x 1 1 ^), the region of 
acceptance A, and the test value t?o are given, Pk depends on Po(t?) only. If we 
make Po(t?) coincide successively with all functions not excluded by some 
preliminary knowledge about the over-all distribution, there must exist a definite 
least upper bound (l.u.b.) of Pg since Pg has the upper bound 1. The value 

S = 1 - l.u.b. Pe 

is the greatest lower bound of the success chance. In other words, for any 
positive € there exists a Po(t?) for which the success chance is aS + c and S is 
the greatest number for which this holds true. We therefore call S the sure 
success rate or, brief!}", the success rate for the test under consideration. If the 
success rate S' for a region of acceptance A' is greater than S, the test using 
A' will be briefly called preferable to that using A. 

Neyman’s approach consists in comparing two regions A and A' with the 
same a. The difference of the respective error chances Pg and P^g is according 
to (2): 

(3) Pg - P\: = IP(A 1 1?) - P(A' I d)] dPoW 

This difference is non-negati\'e, whatever is taken for Po(t?), if for all values of & 

(4) P(A i S P(A' 1 1>). 

In this case Pg ^ Pg and l.u.b, Pg S l.u.b, Pg and therefore S ^ S\ If a 
region A' can be found for which (4) holds for whatever A, Neyman calls the 
test using A' a most powerful test. In fact, this test has at least as large a success 
rate as any other test using a region of acceptance with the same a. Neyman 
does not use the concept of success rate as introduced here, but implicitly the 
success chance is the criterion underlying his analysis of tests.* 

The theory of most powerful tests would supply a complete solution of our 
problem, if (1) a most powerful test existed in all cases, i.e. for all distributions 
P(x 1 1 ?) and all t?o ; and if (2) a sufficient indication how to chose a were given. 
Unfortunately it turns out that in almost no practical case a region A' of this 
kind can be found. The various substitutes for a most powerful test as proposed 
by Neyman and others (unbiased test, test of type A, etc.) need not be discussed 
here, since it is obvious that nothing can be said abput the difference S — S', 
if (4) is not fullfilled for all A and t?. As to the choice of a, the expression 

* This can be seen e.g. from the justification of most powerful tests as given by A. Wald 
[7] p. 16-16. Moreover, the recommendation of a test with highest success rate as the 
‘‘best’' (which is not the purpose of the present paper) could be justified from the stand- 
point of the general theory developed by Wald [6]. Wald introduces an arbitrary weight 
function for defining a “best” test. If the error weight is taken as one in the case of a false 
answer and as zero for each correct answer, Wald’s “best” test coincides with the test of 
highest success rate. The present paper includes only statements that refer to the actual 
numbers of correct and false answers, independently of any arbitrary assumption about 
an error weight. 
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“level of significance” used by Neyman, leaves it open whether a high or a low 
value of a is preferable. 


3. Preliminary example. Before attacking the general problem the discussion 
of a very simple example may provide some information. Let the distribution 
of the variate X be given by the density 

(5) p(x I «>) = 1+ - i), 0 g a; ^ 1. 

It is immediately seen that the integral of p over the inten al 0 to 1 equals 1 for 
each and that p ^ 0, if lies in the limits — \/3, \/3. I.et this be the only 
information we possess about the over-all distribution Po(»?). The value to be 
tested may be = 0. The density for this parameter value reduces to 
p{x I 0) = 1 and thus the probability of x falling within the interval Xi , Xi 
equals Ta — Xi , if d = t>o . According to the notation introduced above we 
may consider as intervals of acceptance .4 all intervals with the limits xi , 
ari 1 — a, where 0 ^ a:i g a. 

The function P{A [ d) is now given by 

/ xi+l-a 

p(a: I «>) dx 

(6) " r .o 

= 1 — a + (1 — a)d' Xi(l — a) — ^ ^ ^ J . 

In particular, for the interval A' between 0 and 1 — a: 

(7) P(A' 1 ,?) = 1 - a - (1 - . 


The difference of these two expressions is non-negative: 

(8) P{A 1 1^) - P{A' I ly) = (1 - a).y*Xi(x, + 1 - a) 

Thus the interval 0, 1 — a is seen to be a most powerful one. The error chance 
of this test is according to (2) : 


( 9 ) 


P'e 


ajTo + 



a - ty*(l - a) jdP„(,y) 


= awo + (1 — a)(l — iro) — (1 — a) “^ -5 — ^ dPo(^). 

6 Jiu) 


The last integral is non-negative and can approach zero indefinitely since the 
total amount 1 — to can be concentrated at a point ly 5 ^ 0 with iy* < t. There- 
fore the l.u.b. of Pg for given a and to is 

CKTo + (1 — a)(l — To) 


On the other hand, this is a linear function of to which takes its extreme values 
at the ends of its interval, to = 0 and to = 1. Thus the larger of the two values 
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a and 1 — a is the l.u.b. of Pg , if Po(t?) is subjected to no further restriction. 
The success rate of the test under consideration is accordingly the smaller of the 
two quantities a and 1 — a. 

For a = 0.99 or a = 0.01 the success rate is 0.01. This means: If we use the 
most powerful test at a level of significance of either 99% or 1%, we risk in both 
cases that 99% of all assertions will be false. If a = the success rate reaches 
its maximum value which is J too. On the other hand it can be seen that each 
interval of length J with not too large Xi would lead to the same success rate. 
In fact, the error chance Pe for the interval Xi , .Ti + 1 — a is according to (9) 
and (6) 

Pe = avQ + (1 — «)(1 — TTo) 

^ - (I - a) - Xiixi + 1 - a) j f _^ dPoid). 

Therefore, the same reasoning as before applies, if the factor in brackets is non- 
negative. This is the case for a = ^ if the interval begias at a point 

^ 4(\/5 — 1) = 0.309. Among these intervals, that with = 0 can be 
considered as preferable since its success chance for anj^ PoW i« at least as high 
as that of any other interval. 

Now, let us assume that in the definition (5) of P(x 1 1 ?) the factor is replaced 
by some function g{d) which takes positive and negative values (within —3/2 
and 3) while varies from — \/3 to \/3. Then equation (6) shows that for 
any two intervals of acceptance A and A' the difference P(A 1 1 ?) — P(A' 1 1 ?) 
changes its sign at least once with varying t?. Thus no most powerful test in- 
terval exists. But, applying (9) and calling gi the (negative) minimum value 
of ^(t?) we find now 

— — (xi + 1 — a)J (1 — TTo) 

as the l.u.b. of the error chance of A' for given a and x. Thus the smaller of the 
quantities 

l-a and 1 - (1 - a)|^l - 

is the success rate of the test using A'. If gi is given we can find, by differentia- 
tion the value supplying the highest success rate. Using (9') instead of (9) 
we find in a similar way the success rates for any other interval. It turns out 
that aS = I for the interval extending from the above given value xi = 0.309 to 
0.809. 

Them are three things we may learn from this example. (1) It can happen 
that a most powerful test, at a high or at a low level of significance, has an 
extremely poor success rate; (2) In the case where a most powerful test with 
the highest possible success rate exists, there may be other intervals with the 
same success rate; (3) If no most powerful test exists, there is no need to look 


avo + (1 — a)(l — xo) — fifj(l 


.)[2 


(2 - 
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for some substitute definition; the success rate for any kind of test can he found 
independently of its being most powerful or not. 

4. General solution for a simple hypothesis. The distribution P(x | of the 

variate X, the parameter value do to be tested, and the set of all possible values 
oi d are supposed to be given. The set of all possible ??-values except i^o is called 
ft. Choose a region of acceptance A and compute first, for all t?, the magnitude 

(10) PiA I d) = f dP(x I d). 

J(A) 

In particular, the value of this integral for d = do will be called 1 — a and its 
maximum value or its l.u.b. on R will be denoted by 

(11) P{A \do) = l - a, l.u.b.(i^) P(A \d) =0. 

The chance of committing an error in asserting ?? = t^o when x falls in A or 
d 9^ do in the case x falls in the complement A is according to (2) 

Pk = aro + f P(A I d) dPo(d), 

where ttq is the jump of Po(d) at the abscissa t? == , or the a priori chance of 

i?o . The domain of integration over JT is (1 — tto) and therefore 0(1 — tto) 
the l.u.b. of the integral. Thus® 

l.u.b. Pe = max {airo + 0(1 — tto)}. 

As TTO can take all values between zero and one, the lowest upper l>ound of Pe 
is cither a or 0. The success rate S, i.e. the greatest lower bound of 1 — Pe ^ 
is consequently the smaller of the quantities 1 — a and 1—0, 

If the distribution P(x 1 1 ?) is given and a region of acceptance A for a test value 
do chosen, the success rate of this test equals the smaller of the two quantities 

(12) 1 — a = P(A I do) and 1 — = 1 — l.u.b. («) P(A | t?), 

if nothing is known about the initial distribution of the parameter except its range. 
Finding a region of acceptance, A, with the highest success rate^ is then a simple 
maximum-minimum problem. 

This solution is not restricted to some rarely occurring type of distributions 
P(x I d) and it is insofar a complete one as it does not leave undetermined the 
value of a. Using Nejunan’s terminology we would have to say: The success 
rate is the smaller of the two quantities: 1 minus level of significance and mini- 
mum power of the test. 

It follows from the definitions (12) that, if P(A | d) is continuous in a 

^ This formula was given by Jackson [1] p. 148 for the **ca8e when the set of alternatives 
is discontinuous’*. Jackson calls the test with highest success rate a ‘^most stringent test”. 
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’interval including f?o , and t? is allowed to take all values of this interval, cannot 
be smaller than 1 — a: 

/3^1 — a or a + iS^l. 

Thus 1 — a and 1 — jS cannot possibly both be greater than The greatest 
possible success rate is then i and it can be reached only if a = jS = |. We state: 
No test can have a success rate S greater than if can vary in an interval including 
t?o without any restriction and P(A 1 1 ?) is a contimious function of in this interval. 
We will see later, in sections 8 and 9, how certain restrictions imposed to 
Po(t?) which are effective in some problems improve the success rate of a test. 

6. Examples. Let us assume that the variate X is normally distributed ac- 
cording to 

(13) P{x 1 1 ?) = ^[h{x — d)], ^(u) = - f dx. 

The parameter value to be tested may be taken as t>o = 0 without loss of gene- 
rality, since in all other cases X — can be considered as the variate. If the 
interval a:i , X 2 is chosen for the region of acceptance, we have 

(14) P(A I = <p[h(x 2 - I?)] - <t>[h{xi - t?)]. 

The right hand side becomes a maximum, if 

^’[h{x 2 — !>)] = <t>'[h{xi — I?)], i.e. ^ = ^{xi + xt). 

Therefore, for = 0 

1 - a = <h{hx 2 ) - <l>{hxi), /3 = - Xi)) - <l>{ih(xi - X 2 ))- 

Both quantities have the value J, if and only if 

(15) xi = -X 2 , <l>{hxi) = }, <l>{hx 2 ) = i. 

These are the probable limits of x. The conclusion is that the probable limits 
supply the interval with the highest possible success rate S = 

The result is not restricted to the particular form of the function 0, it remains 
valid, if <t> is replaced by any function whose derivative has one maximum 
and decreases both ways symmetrically. It is well known that this test which 
has always been used by statisticians and is here proved to have the maximum 
success rate, is neither most powerful nor even, for a general <^, unbiased. We 
also see that the interval determined by (15) is the only closed interval with 
maximum success rate. 

Our method supplies the analogous solution for the case of an unsymmetric 
distribution also. Assume the density 

(16) p(x 1 1 >) = /(x - &), 
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where /(«) is supposed to have only one maximum, say at the point m = 0. 
The value to be tested maj" again be chosen as t^o = 0. For the interval Xi , Xi 
as region of acceptance we have 

P(A 1 «5) = f f(x — d) dx = f f{u) du. 

Jxi •'XJ— I? 

The last expression becomes a maximum with i-espect to &, if 

f(xi — d) = f(xt — d). 

The maximum will occur at the point t? = 0 and accordingly coincide with 1 — a, 
if f{xi) — f(,Xi). Thus we have a region of acceptance with the highest possible 
success rate if Xi , Xi arc determined b3' 

(17) rf(u)du=h /(xi)=/(xj). 

Under the assumptions made for f(t() there exists exactly one pair of values 
, X 2 obeying these equations. This kind of test too has been much used by 
statisticians, but an account of its merits has so far not been given. 

Another example is supplied by the density function 

(18) p(x ! d) = t^'^xe"'*, X ^ 0, > 0. 

We derive for an interval .ti , 

P{A \d) - f p(x I dx = (t>a:i + — {dx 2 + 

If do is the value to be tested, we have 

(19) 1 - a = (.?oXi + l)e"^“** - (t^ox* + l)e~"®"* . 

One may ask for an interval Xi , X 2 wdth the success rate S == J. Then equation 

(19) must be fulfilled with a = ^ and, moreover, P{A | must take its maximum 
value at = t?o . This provides the second condition 

(190 — ^ 0 --- = 0 at d = do, i.e. x'e"*’"** = 

017 

There exists, for each do > 0, one and only one pair of values Xi , xj obejdng the 
two equations (19) and (190- 

In all these examples it turned out that at least one interval with the success 
rate S = ^ (the highest value for a distribution continuous with respect to d) 
exists. It seems that this is a common property of most usual distribution 
functions P(x 1 1 >). But we can easily give an example where the greatest S, 
at least for a single interval as region of acceptance, is smaller than Assume 

(20) P(x 1 1 >) = X + i>x(l - x)(2d*x - 1), Ogxgl, -l^^gl, 
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and let = 0 be the value subjected to testing. For any interval beginning 
at X and extending to a: + 1 — a we find 

P{A 1 1?) = 1 — a + ut? + with a = (1 — * a) (2a: — a), 

(21) o o 

h = 2(1 — a)(~3a: + 3aa: — a + a — a:). 

It is a necessary condition for a test with S = ^ — in the case of a differentiable 
P{A 1 1 ?) — that the derivative of P{A | d) vanishes at = t?o . Thus we must 
have 

(22) = a + 36«>* = 0 for «? = 0. 

017 

This shows that 2x — a must be zero or a: = |. On the other hand, for a = 
a: = J the formula for P{A | d) becomes 

P(A|t>) = i 

Thus P has an inflexion point at = 0 and its maximum, jS, must be greater 
than In the present example, as goes up to 1, we have = 11/16 and the 
success rate is S = 5/16. This does not exclude that intervals with a success 
rate between 5/16 and \ exist. E.g. for x = 0.45 and a = ^ one finds the maxi- 
mum 0 = 0.60 and thus S = 0.40. The optimum interval can be found by dif- 
ferentiating the formula for P(A 1 1 ?) with respect to x and a. 

Examples with the restricted to distinct values will be discussed in section 8. 

6. Composite hypotheses. We have the problem of testing a composite 
hypothesis, if instead of one value a region H of t?-valucs is given and the 
assertions to be made in the course of experiments are belongs to or 
does not belong to H,” The solution developed in section 4 applies to this 
case almost without modification. 

Again, let P{A \^) be the probability of x falling in the region of acceptance A. 
By A and H we denote the regions complementary to A in the sample space 
and to H in the t?-space. Then the error chance is 

(23) Pe^ I [1 - P{A 1 1 ?)] dPo(t^) + f P(A 1 d) dPo(d). 

This is an obvious generalisation of (2). The equation expresses the fact that 
each time x falls in A and d in // or a: in A and d in ff, an error is committed. 
Let us use the notations 



a = l.u.b. of P(A 1 1 ?) for d in ^ 
/3 = l.u.b. of P(A I d) for d in 5 


(24) 
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Then the first of the two integrals in (22) cannot be greater than avo and the 
second not greater than /3(1 — tto). On the other hand no lower upper bound 
exists for either of these integrals, if is given and subjected to no other 
restriction. 

As To varies between 0 and 1, the expression 

OTO + ^(1 — To) 

has its extreme values at the points iro = 0 and tto = 1 and these values are a 
and jS. Accordingly the greater of the quantities a and /3 is the l.u.b. of Pg 
and the success rate S equals the smaller of the two quantities 1 — a and 1-/3. 
If P{A I f^) is continuous with respect to t^, we have again ^ 1 — a, thus a 
and 0 cannot be both smaller than ^ and no S can become > 

If the hypothesis that lies in H is tested by means of a region of acceptance A, 
the success rate of this test equals the smaller of the two qvxintities 1 — a and \ 
which are the minimum of P(A 1 1 ?) for id-values in H and the minimum of P{A | d) 
for d-valiLes outside H, The task of finding the region A with highest success rate 
is thus reduced to a simple maximum-minimum problem. 

As an example let us take the density function 

(25) pix 1 e) = fix - ^), 

where fiu) has a maximum at u = 0 and drops on both sides symmetrically and 
monotonically towards zero. The hypothesis to be tested may be given as 

-6 ^ ^ 5 . 

We find, if the interval Xi , xj is taken for region of acceptance: 

(26) PiA 1 d) = /* fix — &)dx = f fiu) du. 

This function of d has its maximum at + X 2 ) and drops symmetrically 

both sides. If + ^ 2 ) is supposed to lie in the interval (0, b) we find 



Both quantities reach the value if we choose X 2 = —Xi = a and take for a 
the uniquely determined solution of 

f a+b pa—b 

fiu) du = / fiu) du = 

o-t-6 J— o— 6 

♦ 

For this interval the success rate has its highest possible value j. 

7. Case of n variates and k parameters. The analysis given in section 4 for 
a simple hypothesis and in 6 for a composite one extends immediately to the 
case where instead of one variate X and one parameter ^ a group of n variates 
Xi , Xi , • ‘ , X. and a group of k parameters di , t? 2 , • • • , d* are in question. 
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The region of acceptance A is now a portion of the n-dimensional sample space, 
determined by an interval of a function F(xi , xt, •“ x„). The hypothesis 
to be tested will consist in assuming that the point t^i , , * ■ * falls into a 

certain region ff of the A^-dimensional parameter space. The success rate of 
such a test is agtun the smaller of the numbers 1 — a and 1 — j8 where a and (8 
are defined in exactly the same way as in the preceding section. The minimum 
of P(A I «>) when the t>-values fall into ff is called 1 — a, and the maximum 
of the same function for all i^combinations belonging to the complementary 
region ff is 0. 

If the test function F(xi , , • • • x,) is known, the interval with the highest 

success rate, can be found on the same lines as in the case of one variate. In 
fact, the quantity F takes the place of x in the former analysis. If the interval 
thus found has the success rate J, we know that no other test exists which would 
have a higher success rate as long as nothing is known about the a priori distri- 
bution in the parameter space. If a certain F(xi , xi - x„) does not lead to 
an interval with success rate i, one may try another test function. In the most 
general case the test function F with the highest success rate would be found 
by solving the problem of calculus of variation that consists in maximizing 
1 — a and 1 — 0. As a rule such an elaborate analysis will not be necessary. 

To ask that a test be a most powerful one is too much and too little. It is 
too much since such a test does not exist in most cases. It is too little because 
there can exist another test (on a different level of significance) with a con- 
aderably higher success rate. The correct description of a most powerful test 
is that such a test can be shown, in a simple way, to have no smaller success 
•chance whatever Po(«>) is than a group of other tests. If a most powerful test 
exists, it may be considered preferable to all other tests of the same success rate, 
but there is no reason why it should be considered more favorable than any test 
with higher success rate. As to unbiased tests, and other substitutes for most 
powerful tests, nothing at all can be said about their merits as compared with 
that of other tests. 

A simple example for tests with the highest possible success rate in the case of 
several dimensions is the following. Assume a density function 

(28) p(x I t>) = f(xi — , X 2 — • Xn - «?«) 

where f(ui , Ui , • • • Un) depends on the absolute values | Mi | , | i/j | , • • • | m» | 
only and decreases monotonically with incre^ng m* + «* + • • • in all di- 
rections. The parameter point di = t ?2 = -“«?ii^0i8tobe tested. Let 
F(xi , X 2 • • • x„) be a function likewise depending on | Xi 1 , | xa i , • • • j ®n 1 only, 
vanishing at the origin, and monotonically increasing with x* -f- xa + •• • x* . 
Then the set of points for which 


(29) 


F(xi , Xa , • • • X,) S C 
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is a region of acceptance with success rate if C is chosen in such a way as to 
have 

(30) / fixi ,xt, ••• x»)dxidxi • • • dxn = i. 

This applies e.g. to normal populations. The proof is obvious. 

8. Distinct parameter values. Tests with higher success rate than § can be 
found, if the parameter f> is restricted to a set of distinct values. Take for 
instance our first example in section 3 and assume that d can only take the 
three values 0, ±1. Then in the second expression (9) for the error chance the 
integral can not approach the value zero since the region /? does not include 
the point t? = 0. The minimum value of the integral is (1 — to) and thus 

(31) K ^ an + (1 - a)[^l - 
The success rate is the smaller of the two quantities 

1 - a and 1 - (1 - a) ^1 - = 1 - |3. 

The best value of a is found by equating a and This gives about a = /3 = 
0.436 and the success rate S = 0.564, for the region of acceptance a: = 0 to 
X = 0.564. Other intervals or sets of intervals can be examined in the same way. 

A more impressive example is the following. We draw n = 12 times from an 
urn which contains three balls, black ones and white ones. The observed value x 
is the number of white balls drawn. The probability of getting a white ball 
in one experiment can have one of the four values 0, 1/3, 2/3, 1, and we want 
to test the hypothesis & = = 1/3. The probability distribution is given by 

(32) t(x I &) = CU’{1 - t>)"-* 

Let us choose the set of points a: = 1, 2, • • • 6 as region of acceptance. Then 

(33) P{A I d) = Z C: ^*(1 - «?)""*. 

X— 1 

This sum can be computed for the 4 possible «?-values: 

P{A I ly) = 0 0.926 0.178 0 

for ty = 0 1/3 2/3 1 

Thus 1 — a has the value 0.926 and d equals 0.178. The success rate is the 
smaller of the two quantities 0.926 and 0.822, thus S = 0.822. If we restrict 
the region of acceptance to the points a: = 1 to 5, the values of 1 — a and 1 — 0 
become 0.815 and 0.934, thus the success rate S = 0.815. In the first case we 
have more than 82% chance of making a correct assertion, whatever the a priori 
probability of may be! 


^](l-vo). 
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It is obvious that this result will become more and more strongly marked, if 
the number of observations increases. This is connected with the subject of 
the next section, 

9. Asymptotically increasing success rate. It seems strange that in the case 
of a continuously varying parameter and a distribution P(x 1 1 >) which is con- 
tinuous with respect to no test can have a success rate > One has the feeling 
that something might happen in the continuous problems similar to what was 
the case in the example of section 8. On the other hand our proof that S ^ 
in sections 4 and 6, is conclusive and it applies to problems in more than 1 di- 
mension also. The answer is that in the kind of problems where a large number 
of observations is involved a definite restrictive assumption about the over-all 
distribution Po(f>) is silently introduced. 

The problems we have here in mind are connected with sequences of distribu- 
tions of the form 

(34) Pnix 1 - i>), 

where 0i(u), ^(w), <t> 3 (u), • • • are cumulative distribution functions for distribu- 
tions more and more concentrated around one point, say u = 0. In a rigorous 
form the sequence 4 >„(m) can be described by the following statement: For each 
€, If > 0 exists a number N{e, if) such that 

(35) </>n(if) — «^»(— If) ^ 1 — € forn > N{t, ri). 

One wants to test the hypothesis 

-6 ^ ^ 5, 

under the assumption that the parameter distribution does not depend on n. In 
this case, as we shall show, one can find for each c > 0 a region of acceptance A 
such that the success rate Sn of the test corresponding to this A and to P„(x \ d) 
is greater than 1 — € for sufficiently large n. 

We divide the region R, i.e. [ | > 6, into two parts Ri and fl 2 where Ri 

consists of the points 1 | ^ h + 2t} and satisfies the condition 

(36) /_ dPoW g L 

•'(Hi) 3 

Then the region of acceptance will be 

—a = —b — if^aj^fe-Hif^o, 
and the probability of x falling in this region: 

(37) Pn(A I d) = «„(6 + If - I?) - 0„(-6 - D - tf). 
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As long as belongs to H the right hand side in (37) is not smaller than 
^n(>?) — ^n(— 1 ?) and thus, according to (35) the error chance of first kind 


= f [1 - PM 1 d)] dPo{&) ^ 1 - [€>,(,) - M-n)] ^ 

Jfj?) 


for n >N 


G’’)- 


The error chance of second kind can be written as 


Plf = f _ Pn(B I d) dPoW + f Pn(A I d) dPo(d). 


The first of these integrals cannot be larger than | according to (36) 

since P„(A 1 1 >) ^ 1 . The second integral cannot exceed the maximum value 
of P„(A 1 1 >) for tfinffi. But if 1 | > 6 + 2 ?; the two arguments of </>„ in (37) 
have always the same sign and are in absolute value greater than rj. It then 
follows from (35), in connection with the fact that <<>„(«) increases monotonously 

from 0 to 1, that the difference of the two ^„-values cannot exceed ^ for n > 

<5 

N((/3, 7i). Therefore 

(40) P,<;^ ^1 + 5 and = 1 - P1"> - P)r> ^ 1 - € for n > 

This result has a wide range of application in the cases where a hypothesis 
is tested on the basis of a large number of independent observations. Consider 
a sequence of variates Xi , X 2 , X 3 , • • * subject to probability distributions 
Oi(xi), Q2 (x2), Qsixs), • • • . Let X = F(xi , X2 , • • • Xn) be a statistical function, 
i.e. a function depending on the distribution of its n variables only, and the ex- 
pected value of F, Then the general law of large numbers states that the 
distribution of x has the form (34) with <t>n satisfying the inequality (35), if the 
Qn{x) fulfill certain conditions concerning mainly their behaviour at infinity®. 
The proof of this theorem which is the real source of most ‘‘asymptoticar’ 
properties of statistical tests was given for the first time in 1936. The particular 
case where F is the arithmetical mean of the n variables Xi , X2 , • • • Xn has been 
known as Tchebychef’s theorem since 1867. 

Applying this general law of large numbers we can now state the following 
fact. In testing a hypothesis about the expected value ^ of any regular statistical 
function of n variates we can reach a success rate 1 — €, no matter how small e is, 
if the number n increases indefinitely and the initial distribution of is supposed 
to be independent of n. On the other hand, no test with a success rate greater than 
J is available, if an assumption of this type is not used. 


• For exact conditions see ref. [3], 
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10. Summary. In this paper a solution of the problem of testing hypotheses 
is presented in the following sense. It is assumed that a probability distribution 
depending on some parameters is given and that nothing is known about the 
initial distribution of these parameters. For any simple or composite hjrpothesis 
about the parameters and any region of acceptance chosen in the sample space 
the success rate S is computed, i.e. the minimum chance for getting right answers 
out of the test. From the formulae given for S a test with highest success rate 
can easily be found in each case. 

This theory shares the point of departure with the actually used theory which 
leads to the concept of most powerful tests. A most powerful test is described 
fLB a test which, by simple reasoning, can be seen to have no smaller success 
chance than any other test on the same “level of significance” a. In the rare 
cases where most powerful tests exist for all a-values, one of them, with an 
a-value singled out by our theory, has the highest success rate and then is pref- 
erable to all other tests which might have the same success rate. In all other 
cases our method supplies a test of highest success rate in no relation to “un- 
biased” tests or other current substitutes for most powerful tests. 

Some of the main results are: No test has a success rate >i, if nothing is 
known about the parameters except the limits of their values and if the 
given distribution is a continuous function of the parameters. The success rate 
can be higher, if the parameters are restricted to certain distinct values. A 
success rate no matter how close to 1 can be reached in a sequence of tests based 
on an increasing numljer n of observations, if the initial distribution of the 
parameters is known to be independent of n. 
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ON THE RELIABILITY OF THE CLASSICAL CHI-SQUARE TEST 

Bt E. J. Gumbel 
New School for Social Research 

For a given set of observations and for a continuous variate, different classi- 
fications lead to different observed distributions and to different values of x*- 
This shortcoming has been vaguely felt by statisticians. We shall explain how 
these differences arise and show that they are important enough to cast a great 
deal of doubt on the validity of the application of the usual x method to a con- 
tinuous variate. Finally, we propose a procedure which is free from these 
difficulties. 

1. The observed distributions. The x* method gives a numerical measure of 
the differences between the observed and the theoretical distribution. A theo- 
retical distribution is completely determined once the constants are known. 
For a discontinuous variate the observed distribution is also well defined; but 
for a continuous variate the concept “observed distribution” is vague. To 
classify N observations, Xi, xj, • • ■ Xm, - x^ arranged in increasing order, we 
introduce two arbitrary actions: the choice of the intervals and the beginning 
of the first cell. As a rule, all cells have the same length, and they are bounded 
by integral numbers, or even numbers, or round numbers, 0, 5, 10, of the variate. 
But these classifications and the preference given to round numbers for the start- 
ing point have no theoretical foundation. 

A certain guide for the systematic choice of the class length and the beginning 
of the first cell may be found by turning to the theory. Many theoretical dis- 
tributions of a continuous variate x have only two constants, and permit the 
introduction of a reduced variate y with the dimension zero, where 

(1) » ' -r ■ 

The constant o is a mean, and 6 is a measure of dispersion. The probabilities 
W (x) (or F{y)) for values equal to or less than x (or y) ai‘e 

(2) W{x) = F{y). 

For most distributions, for which the above transformation is possible, tables 
for F{y) exist, in which the argument progresses by a fixed interval Ay. By 
taking an initial value yo and a fixed interval Ay, the differences 

(3) NF(yo + iAy) - NF(yo + (i - l)Ay) - Npi (f = 1, 2, • • • fc) 

may be interpreted as being the theoretical distribution. The corresponding 
values of the variate, by (1), are 

(4) x(i) = 0-1- b{yn -f iAy); x(i - 1) = o -f b{yt + {i - l)Ay ) ' 

263 
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and the cell length is 

(5) A{x) = bAy. 

In (3) k is the number of cells. In general, x(i) and x(i — 1) will not exist among 
the observed values Xm . By arranging the observations in the cells given by 
the theoretical values (4), we obtain an observed distribution consisting of the 
contents a*- of the cell i, TTiis procedure prescribes a classification of the observa- 
tions according to the theory. The intervals selected are multiples of some 
measure of dispersion. In principle, the choice of Ay and of the starting point 
yo remain arbitrary; in practice, the selection of Ay is limited by the intervals 
given in the probability tables. 

This natural classification may be used for constructing different observed 
distributions from the same set of observations. We determine the constants, 
then choose a small interval and a starting point which is below the smallest 
observation Xi . The last cell is such that it contains the largest observation Xy . 
In this way, we obtain the initial observed distribution, consisting of k cells. 

If we combine h cells (h = 2, 3, • • * J/b), we obtain h different observed dis- 
tributions: We combine — 1 void cells with the first cell of the initial distribu- 
tion, we combine the second cell and the following h -- 1 cells of the initial dis- 
tribution, and so on. Generally, we combine q void cells (g = /t — 1, — 2, • • • 
1 , 0) with the first h — q cells of the initial distribution, then the next h cells of 
the initial distribution, and so on. The last of these h distributions starts with 
the first h cells of the initial distribution. 

If we combine more and more cells, the number of observed distributions, 
having the same intervals, increases. The larger the intervals the larger is the 
influence of the starting point, and the more the observed distributions become 
dissimilar. To see this influence of classification on the shape of the observed 
distributions, consider the extreme case for a symmetrical theoretical distribu- 
tion of an unlimited variate. Let the observed distribution consist of two cells. 
Assume besides that the observed median is close to the theoretical one. If the 
cut between the cells is identical with the theoretical median, the two cells have 
the contents + e and — e, where € is small. If the cut is shifted suffi- 
ciently far to the left or right of the median, the cell contents will be 0, N and 
N, 0. These two distributions are completely different. 

To each observed distribution corresponds a theoretical one obtained from 
(3) by the same combination of cells as the observed distribution. In the graphi- 
cal representation, the same continuous theoretical distribution may be used for 
all observed distributions by choosing the scale* of the ordinate properly. The 
length chosen for representing one observation in the initial distribution will 
represent h observations for the h distributions obtained by the combination 
of h cells. 

The different observed distributions corresponding to the same observations 
and to the same theory will give different values of 

2 _ ^ (Oi — NpiY 

^ h Npi 


(6) 
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The expected contents of the first and last cell are 

( 7 ) Npi = NF(yo 4 - Ay), 

(8) Np, = iV(l - 'F(yo + (k- l)Ay)). 

Since the total expected frequency must be equal to the number of observations 

(9) i: Npi = E a. , 

t-1 *-l 

formula (6) may be written 

( 10 ) 

t=i Npi 

This formula, being simpler than (6), will be used in the numerical example. 

An upper limit for x is furnished by the case that one cell j contains all ob- 
servations. Then 


whence from (10) 


aj = N; (li = 0 for i 7 ^ j, 


( 11 ) 0 g - N. 

Vi 

The upper limit depends again upon the intervals and the starting point of the 
classification. If the probability for an observation to be contained in the cell 
j is small, the upper limit is large. 

The exact distribution of x bas not yet been established. To obtain an ap- 
proximation, it is assumed that a binominal distribution may be replaced by a 
normal distribution. As this does not hold for cells with a small expected fre- 
quency, the contents of such cells must be combined. This prescription, which 
is also valid for a discontinuous variate, constitutes a third arbitrary action in 
the calculation of It invalidates the prior postulate that all cells ought to 
have the same length. 

The approximation used for the probability P of obtaining a value of x“, equal 
to or larger than the observed one, is 

(12) P(x*,.') = Kf%**<’-V**’(fe* 

where v is the number of degrees of freedom. Since 


( 13 ) 




0 , 


P diminishes as x increases, v being given, but P increases as v increases, x 
being given. By choosing larger cells, the number v diminishes, and P may 
remain the same if x diminishes adequately. 

It is easy to see that x cannot increase as a result of the combination of cells 
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and will, in general, decrease. Let ai and (h represent the actual number of 
observations in two cells that are to be combined. Let Npi and Np 2 be the ex- 
pected numbers. Then, the contribution of the two separate cells to x minus 
the contribution of the two combined cells is, by (10) 

Ui I ^2 "1“ 2(Zi(i2 “h dt 

Npi Np2 N(pi + P 2 ) 

As ai and 02 are positive or zero, the difference is proportional to 

alpl + alpl - 2aia2PiP2 = {aip2 - (hpif ^ 0. 

The equality holds only when ai : a 2 = : P 2 . Then, the combination of cells has 

no influence on Xy but it reduces the number of degrees of freedom by one, and 
diminishes the probability P. In the general case, the combination of cells 
diminishes x and diminishes v at the same time. According to (13), the first 
influence tends to increase the probability P, the second to diminish it. It 
cannot be stated a priori which influence is stronger. 

For a given set of observations, a continuous variate and a given theory, which 
includes given estimates of the constants, the probability P depends upon three 
arbitrary actions. If a certain choice of the intervals gives a good fit, it cannot 
be concluded that a broader classification gives the same or a better fit [4]. For 
a given interval, P may vary considerably with the starting point. This influ- 
ence cannot be allowed for by any formula as the number of degrees of freedom 
does not depend upon the starting point. Finally, the term “small expected 
numbers’’ is vague. Different combinations of cells lead to different probabili- 
ties. It is generally assumed that these influences remain within reasonable 
limits and that P does not vary considerably if we change the class length or the 
starting point. In the following example, we shall show that this opinion is 
erroneous. 

2. Numerical example. The flood discharge of the Mississippi River at Vicks- 
burg for each of the fifty years 1890-1939 will be used to illustrate the extent to 
which the observed distributions and P vary with the choice of cell length and 
the starting point. The observed flood discharges Xm measured in 1,000 cubic 
feet per second are given in Table VI of a previous article [2], and are not re- 
peated here. The expected distribution is given by the theory of largest values 
which states that the probability 5EB(x) of a flood discharge equal to or less than 
X is 

(14) 2B(a:) = 

Values of 3B(x) as a function of the reduced variate 

(15) J/ = a(x - u), 
are given in Table II of the reference first cited. 
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Calculation of the constants a and u leads to the theoretical value of the flood 
discharge 

(16) X = 1201.9 + 266.1y 

associated with a given probability F(y) = SB(x). 

TABLE I 


Observed and theoretical distribution (1) for the interval Ay = .25; Ax = 66.626 


Variates 

Distributions 

Reduced 

Absolute 

Observed 

Theoretical 

y 

X 

a* 

1 . ... 

Npi 

1 

2 

1 3 

4 


736.2 

1 

.5655 

^-1.50 

802.8 

1 

.969 

-1.25 

869.3 

3 

1.775 

-1.00 

936.8 

3 

2.720 

-.75 

1002.3 

5 

3.5955 

-.60 

1068.9 

1 

4.2315 

-.25 

1135.4 

3 

4.5475 

.00 

1201.9 

3 

4.654 

.25 

1268.4 

3 

4.314 

.50 

1334.9 

6 

3.914 

.75 

1401.5 

6 

3.434 

1.00 

1468.0 

4 

2.934 

1.25 

1534.6 

2 

2.4565 

1.50 

1601.1 

0 

2.0235 

1.75 

1667.6 

2 

1.647 

2.00 

1734.1 

0 

1.3270 

2.25 

1800.6 

2 

1.0615 

2.50 

1867.2 

2 

.844 

2.75 

1933.7 

0 

.668 

3.00 

2000.2 

2 

.527 

3.25 

2066.7 

0 

.414 

3.60 

2133.3 

0 

.325 

3.75 

2199.8 

0 

.255 

4.00 

2266.3 

0 

.1995 

g4.25 

2332.8 

1 

.708 


50 

50.000 


The first observed distribution presented in Table I is obtained by letting 
Ay = .26; Ax =* 66.525 and yo = — 1.75. The expected number of observations 
for the first and last cell are 50F(— 1.5) and 60 (1 — F (4.26)) respectively. 
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The expected frequencies (formula 4) for the other cells 
npi = 50 [F(y + .25) - Fiy)], 

were obtained by successive substraction of two consecutive figures given in 
column 2, Table II [2]. The theoretical and the observed distribution are 
plotted in figure 1. The observed distribution given in Table I is very irregular. 

Evidently, the intervals are too small. Therefore, we construct the observed 
and theoretical distributions (2) and (3) for cells which are two times larger. 



The first cell in distribution (2) is obtained from distribution (1) by combining 
the first cell of (1) with the empty one before it; the second cell is obtained by 
combining the second and third cells of (1); and so on. 

Distribution No. 3 is obtained by combining the first two cells of distribution 
No. 1 , then the third and fourth, and so on. The observed distributions 2 and 3 
and the theoretical distribution are plotted in figure 2. The scale of the ordinate 
is i of the scale in figure 1. In the same way, the three observed distributions 
(4), (5), (6) for the interval Ay = f. Ax = 199.57 are obtained by combining 
either two void cells with the first cell of Table I, or one void cell with the first 
and second cell of Table III, or the first three cells of Table I (see fig. 3). 

Finally, the four observed distributions (7), (8), (9), (10) for the interval 
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At/ = 1 ; Ax = 266.1 are compared with the theoretical distribution in figures 
4 and 5. The four distributions 7-10 differ considerably. Distributions 8 and 
9 indicate that the agreement between theory and observations is good, dis- 
tribution 7 and 10 indicate that the fit is bad. The x method must give the 
same contradictory results. 


TABLE II 


Four values of P(x^) for the same observations and the same theory 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


Observed Distri- 

Theoret- 





Mid- 

points 

1 

sutioi 

U3, Oi 


ical Dis- 
tributions, 

Components of x* + A" 


(7) 

(10) 

(9) 

(8) 

Npt 





803 



5 


3.2995 



7.577 

[ 

i 

869 


8 



6,0195 


10.632 



936 

13 




9.6150 

17.577 




1002 




14 

13.8465 




14.155 

1069 



12 


15.0945 



9.540 


1135 


12 



16.9285 


8.506 



1202 

10 




17.6470 

5.667 




1268 




15 

17.3295 




12.984 

1335 



18 


16.2160 



19.980 


1401 


19 



14.5960 


24.733 



1468 

18 




12.7385 

25,435 




1534 




12 

10.8480 




13.274 

1601 



8 


9.0610 



7.063 


1667 


4 



7.4540 


2.146 



1734 

4 




6.0590 

2.641 


1 


1800 




6 

4.8795 




7.378 

1867 



7 


6.3290 



7.742 


1933 


7 



5.0020 


9.796 



2000 

5 




2.9405 

6.344 




2066 




3 

3.0965 




2.907 

N 

50 

50 

50 

50 

200.0000 

X* + iV = 57.664 

55.813 

51.902 

50.698 

V 

2 

2 

2 

2 

P 

.023 

.057 

.399 

.705 








The details for the calculations of are given in Table II. The numbers of 
column 1 are the midpoints of the cells. To save space, the four theoretical dis- 
tributions obtained from Table I, col. 4 are written in the same column (6) 
directly opposite the corresponding observed distributions given in columns 
2 to 5. Through formula (10) we calculate the components of x + N (cols. 7 
to 10). Although the four distributions differ only with respect to the beginning 
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of the first cell, the value of P for the observed distribution number (8) is more 
than thirty times the value of P for the observed distribution number (7). In 
view of the fact that these values of P are calculated for a fixed set of observa- 
tions, for the same theory, the same constants, and the same number of degrees 
of freedom, the differences found are surprising. 

3. The probability integral transformation. This example shows that the 
probability P may vary with the starting point in such a way that no conclusion 
about the acceptance or rejection of a hypothesis can be obtained from the usual 

method. The three arbitrary steps described above may be avoided if we 
choose cells of equal probability instead of cells of equal length. The required 
intervals are obtained from the probability integral transformation, due to Karl 
Pearson [6]. Let w{x) be a distribution of a continuous variate x, let y = W{x) 
be the transformed variate, then the distribution p{y) of the variate y is 

(17) piy) = 1. 

In other words: The probabilities W{x) are uniformly distributed. If a distribu- 
tion w{x) has been chosen for a given set of observations Xm , we can control this 
theory by investigating whether the ‘‘observations^^ W{Xm)y i.e., the theoretical 
cumulative frequencies of the observed values are uniformly distributed. Thus, 
the comparison of the observed distributions with any continuous theoretical 
distribution is reduced to the comparison of an “observed^^ with a theoretical 
uniform distribution. To a given set of observations and a given theory there 
is one, and only one, “observed^’ distribution. If we introduce within w{x) 
another set of constants, or choose instead of w{x) another theory <^(x), we ob- 
fain, of course, other “observed^^ values [1]. 

The goodness of fit between this theory and these “observations” may be 
measured by the x method. We divide the interval zero to iV, which contains 
the N “observed” numbers NW(Xm) into k cells of equal length, and enumerate 
the “observed” points NW{xm) contained in each cell. The starting point of the 
classification is always zero. The expected number of observations for each cell 
is always N/k, If we choose k sufficiently small, the necessity for combining 
cells is eliminated. We have to choose k in such a way that the conditions, 
under which formula (12) holds, are fulfilled. The question of the best choice 
for the number of cells has been studied by Wald and Mann [3]. Their solution 
is valid for small levels of significance and for large numbers of observations. 

4. Conclusion. The usual x^ test is unreliable for a continuous variate as it 
involves three arbitrary decisions. From the same observations, the same 
theory, and the same constants different statisticians, equally well trained and 
equally careful, may obtain different probabilities P, and may proclaim any one 
of these results as final. Therefore, the usual x method does not lead to a de- 
cision whether a hypothesis has to be rejected or not. Such a decision is possible 
if we use the probability integral transformation. Unfortunately, the question 
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of the best choice of the cells for small numbers of observations and large levels 
of significance is not yet solved. 
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A SAMPLING INSPECTION PLAN FOR CONTINUOUS PRODUCTION' 

By H. F. Dodge 

Bell Telephone Laboratories, New York 
L Introduction 

1. Purpose. This paper presents a plan of sampling inspection for a product 
consisting of individual units (parts, subassemblies, finished articles, etc.) manu- 
factured in quantity by an essentially continuous process. 

The plan, applicable onl}’' to characteristics subject to nondestructive inspec- 
tion on a Go-NoGo basis, is intended primarily for use in process inspection of 
parts or final inspection of finished articles within a manufacturing plant, where 
it is desired to have assurance that the percentage of defective units in accepted 
product will be held down to some prescribed low figure. It differs from others 
which have been published^ in that it presumes a continuous flow of consecutive 
articles or consecutive lots of articles offered to the inspector for acceptance in the 
order of their production. It is accordingly of particular interest for products 
manufactured by conveyor or other straight line continuous processes. 

In operation, the plan provides a corrective inspection, serving as a partial 
screen for defective^ units. Normally, a chosen percentage or fraction / of the 
units are inspected, but when a defective unit is disclosed by the inspection it is 
required that an additional number of units be inspected, the additional number 
depending on how many more defective units are found. The result of such in- 
spections is to remove some of the defective units, and the poorer the quality 
submitted to the inspector, as measured in terms of per cent defective, the greater 
will be the corrective or screening effect. The object of the plan is the same as 
that incorporated in some of the sampling tables already published^ namely, 
to establish a limiting value of ‘^average outgoing quality^^ expressed in per cent 

^ Presented at the Joint Meeting of the American Society of Mechanical Engineers and 
the Institute of Mathematical Statistics, May 29, 1943, by H. F. Dodge, Quality Results 
Engineer, Bell Telephone Laboratories, New York. 

* H. F. Dodge and H. G. Romig, ‘^Single Sampling and Double Sampling Inspection 
Tables^’, Bell Sys. Tech. Jour., Vol. XX (1941) pp. 1-61. An unpublished paper by Prof. 
Walter Bartky (developed when he was associated with the Western Electric Co., 1927) 
provides a continuous multiple sampling plan involving two factors—/, as used here, and i, 
the number of units in a ‘‘compensating sample” required to be inspected for each defective 
unit found. 

® Lt. 11. J. Saunders, “Standardized Inspection”, Army Ordnance, Vol. XXIV (1943) pp. 
290-292; G. Rupert Gausc, “Quality Through Inspection”, Army Ordnance, Vol. XXIV 
(1943) pp. 117-120. 

* A unit of product that fails to meet the requirement for a characteristic is classed as 
nonconforming with respect to that characteristic, and for convenience is referred to as 
“defective”. Thus, a deviation from a specified requirement or from accepted standards 
of good workmanship is termed a “defect”. 

‘ H. F. Dodge and H. G. Romig, loc. cit. 
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defective which will not be exceeded no matter what quality is submitted to the 
inspector. This limiting value of per cent defective is termed the ^^average 
outgoing quality limit (AOQL)'\ 

The theoretical solution treats the case of inspecting a continuous flow of 
individual units and is based on the distribution of random-order spacing of 
defective units in product whose quality is statistically controlled.® Part III of 
the paper extends the application of the method to a continuous flow of individual 
lots or sub-lots of articles. 

II. Inspection of a Flow of Individual Units 

2. Inspection of one characteristic. Consider first the inspection of a flow of 
individual units, offered consecutively in the order of their production. As- 
sume that inspection is to be made for only one quality characteristic, so that 
interest will be centered on one kind of defect. Subsequently (Section 13), 
consideration will be given to the procedures when inspection is made simul- 
taneously for several kinds of defects. 

3. Procedure A. The procedure is as follows: 

(a) At the outset, inspect 100% of the units consecutively as produced and 
continue such inspection until i units in succession are found clear of 
defects, 

(b) When i units in succession are found clear of defects, discontinue 100% 
inspection, and inspect only a fraction / of the units, selecting individual 
sample units one at a time from the flow of product, in such a manner as 
to assure an unbiased sample. 

(c) If a sample unit is found defective, revert immediately to a 100% inspec- 
tion of succeeding units and continue until again i units in succession are 
found clear of defects, as in paragraph (a). 

(d) Correct or replace with good units, all defective units found. 

4. Protection provided by the plan. The inspection plan is defined by the 
two constants, / and i, which can be altered at will. For given values of /, f, and 
p (incoming fraction defective), there will result for product of statistically con- 
trolled quality a definite average outgoing fraction defective (average outgoing 
quality, AOQ). For given values of / and t, the AOQ will have a maximum for 
some particular fraction defective pi of incoming quality. As noted above, this 
maximum is referred to as the average outgoing quality limit (AOQL). For all 
other values of incoming fraction defective p greater or less than pi , the AOQ 
will be less than AOQL. Many combinations of / and i will result in the same 
AOQL. 

The protection offered by the plan discussed here can thus be expressed in 
terms of the AOQL, in per cent defective. 


• ^^Statistical controP* as defined in the literature; see W. A. Shewhart, Statistical Method 
from the Viewpoint of Quality Control^ The Graduate School, U. S. Dept, of Agriculture, 1939. 
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6. Theoretical framework. We are concerned with the spacing between 
defective units when the individual units are arrayed in the order of their pro- 
duction, as shown in Fig. 1. If the manufacturing process is statistically con- 
trolled so that the probability of producing a defective unit is constant and equal 
to p, then defective units will have an order spacing of a random character which 
is expressible in terms of certain probability laws. Product turned out by such 
a process will be referred to as having a process average fraction defective p. 
The ‘‘event^' of particular interest is a ‘‘terminal-defect sequence^’ of i + 1 suc- 
cessive units following the observance of a defect, comprising a succession of i 
nondefective units followed by a defective unit, as shown in Fig. 1. The totality 
of all possible such sequences, where i varies from 0 to oo , constitutes the uni- 
verse of events under consideration. 

Each such sequence of i + 1 units, comprising i successive nondefective units 
followed by a defective one, has a definite probability of occurrence, for a process 
average fraction defective, p. The complete set of such probabilities for all 
possible sequences, having respectively t = 0, 1, 2, 3, • • • «» , defines a probability 
distribution^ of random-order spacing of defects in uniform product. This is 


, DEFECTIVE UNIT 

r-NONOEFECTIVE UNIT 


0 X 


Uo, 


ooxooooooooox 


TERMINAL DEFECT 
SEQUENCE 


OOOOOOOOX 


ORDER OF PRODUCTION 


Fig. 1. Spacing of defective units 

ohown in the table below in which 0 represents a non defective unit, X represents 
a defective one, p is the fraction defective, and g = 1 — p. 



Defect 

Spacing 

No. of Non- 
defective 

Proba- 

No. of 


{No. of 

Units before 

bility of 

Term in 


units in 

Finding the 

Occur- 

the Power 

Sequence 

sequence) 

Next Defect 

rence 

Series 

X 

1 

0 

V 

1st 

ox 

2 

1 

pq 

2nd 

oox 

3 

2 

pq^ 

3rd 

ooox 

4 

3 

pq* 

4th 

oooox 

5 

4 

pq* 

5th 

000- •••oz 

i + 1 

i 

pq* 

(* + 1)8< 

• 

• 


• 

• 

• 

• 


• 

• 

. 

. 


• 

• 


^ Romano vsky, V., *‘Due Nuovi Criteri di Controllo Sull ^andamento Casuale di Una 
Successione di Valori^\ Giornale dell *Instituto Italiano degli Aiiuari (1932) discusses this 
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These probabilities are the successive terms in the infinite power series 


( 1 ) 


P + + • • • 

or p(l + g + 5* + 9 ’ + • • •)• 


The sum of this series is 



1, i.e., the total probability for all possible 


sequences is unity (as it should be). 

The sum of the first i + 1 terms of the series is the probability of occurrence 
of a “terminal-defect sequence^^ (defect spacing) of i + 1 units or less. The sum 
of the first i terms is the probability, Pi , of failing to find the next ^ units clear 
of defects, which is 


( 2 ) 


S pg* = 1 - q'. 


j-0 


In turn, the sum of all terms beyond the ith term is the probability of finding 0 
defects in the next i units, which is 


(3) 


Qi = 1 - Pi = q\ 


These results and the power series (1) enter into subsequent portions of the 
discussion. The curves of Fig. 2 give values of 1 ~ q\ 

6. Average outgoing quality. Suppose a plan is selected, choosing specific 
values of / and i. 

For given values of i and p, there will be an expected average number of 
units, Uy inspected following the finding of a defect. Likewise, for given values 
of / and p there will be an expected average number of units, v, that will be 
passed under the sampling procedure before a defect is found. The latter 
average number includes the sampling units actually inspected as well as the 
uninspected units produced between successive sample units. 

The average fraction of the total product units inspected in the long run is 


(4) 


F = “ 

U + V* 


It is now assumed for purposes of solution that the inspection operation itself 
never overlooks a defect and that all defective units found during the inspection 
of / and i will be corrected or replaced by good units.® 


probability distribution of spacing of events, referring to the spacing as the “length of a 
partial series”. Our term “terminal-defect sequence” has the same significance as his term 
‘‘partial series”. See also P. S. Olmstead, “Note on theoretical and observed distributions 
of repetitive occurrences”, Annals of Math. Slat. Vol. XI (1940) pp. 363-366; A. M. Mood, 
“The distribution theory of runs”, Annals of Math. Stat.j Vol. XI (1940) pp. 367-392. 

* The assumption that the inspection operation is perfect cannot be made without reser- 
vation. Machine inspection devices have their margins of error. Also, inspection fatigue 
prevents 100% manual and visual inspections from insuring perfection, particularly if such 
inspections continue over a considerable period of time. But the efficiency of the latter 
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As a result of the screening effect of the inspection, the average outgoing 
quality, AOQ, designated , is related as follows to the incoming quality p: 

(5) = = 

7. Determination of w. The average number of units, u, inspected on a 100% 
inspection basis following the finding of a defect is a function of i and p, and 
may be determined from a considration of two power series, one limited and 
the other infinite. 

Once the 100% inspection starts, there are several things that can happen 
before i units are found clear of defects. The first i may be found clear; or 1, 
2, 3, or more defects may be found before finally a run of i units is found clear. 

One of the quantities to be determined is the average number of units inspected 
in a ‘^failure sequence,^^ that is, one terminating in a defect and comprising i 
or less units. This average number, designated as h, is the average of the 
distribution made up of the first i terms of the power series (1). The average is 

(6) h = . ( 1 + + 4^^ + • • • + iq* ^), 

where the denominator is the sum of the probabilities for the first i terms. This 
may be evaluated as follows: 

+ ^ + + + + 

= _p_ 

l - q'dql 1 - q J 

(7) 

Note that if pi is small compared with unity, h is approximately 1/p. 

The next step is to determine the average number of failure sequences that will 
be encountered before finding i units clear of defects. This average number, 
designated as G, may be found from the probability distribution of all possible 
numbers of failure sequences, expressed by the infinite series 

(8) Qi(l + Pi + Pi + Pi + • • •) 

where Pi is given by equation (2), Qi = 1 — Pi , as given by equation (3), and 
the successive terms are the probabilities of occurrence of 0, 1, 2, 3, etc. failure 


inspections is generally higher when an interest incentive is provided as is usually the case 
in sampling inspection plans where the extent of such inspections hinges on their findings. 

The solution given assumes correction or replacement of defective units. Where it is 
expedient to reject such units and not replace them, equations (19) to (22) inclusive, should 
be modified by replacing i by i — 1. 
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Fig. 2. Curves defining distribution of random order spacing of defects in uniform product 
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sequences before finding i units clear of defects. The average number of failure 
sequences, (7, is given by the sum of the infinite series 

G = Qi(0 + IPi + 2Pl + 3P! + • • .) 

(9) = QiPid + 2Px + 3P? + 4P! + • • •). 


Summing the series, we have 

(10) G = <2*^1 (1 - Pjy2 = Qj 



Now u, the average number of pieces inspected following the finding of a 
defect, is made up of a number of failure sequences followed by a run of i units 
clear of defects. Using the average values of G and h just found, we have 


( 11 ) 


u = Gh -f- f . 


8. Determination of v. The average number of units, v, that will be passed 
in a period of sampling inspection will be l/f times the average number of in- 
dividual sample units inspected in such periods. Here again the solution will 
depend on the random order spacing of defects in uniform product. Whether the 
individual units selected during the sampling inspection procedure are selected 
by a random spacing device, or by any other means which will prevent known 
bias in the sample, we may assume that defects will be found to occur in ac- 
cordance with the distribution of random order spacing defined by the terms of 
the series given in (1). The average number of sample units inspected in a 
period of sampling inspection will thus be the average defect spacing for product 
having fraction defective, p, which is given by the infinite series. 

( 12 ) // = p(H-2g + 3g* + 43*+ •••). 

Summing the series, we have 


(13) 


H = 


V 

(1 - qy 


and the value of v is found to be 


1 

P’ 


(14) 


V = 


^1 = L 

f fp' 


9. Determination of / and i for a given value' of AOQL. From the considera- 
tions given above, the average fraction of the product inspected, F, and the 
value of average outgoing quality, , can be determined for any given values 
of p,/, and i. Substituting in (5), the values of u and v given in (11) and (14). 
we have 

"/+ (1 -mi-pA 


(15) 
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The average outgoing quality limit, AOQL, (pt) is the maximum value of px 
that will result for any given values of / and i, considering all possible values of 
p in the submitted product. The value of p for which this maximum value of 
Pa occurs is designated by pi , hence 

(16) P" = Pi [l - • 

The value of pi for which px = Pl is determined by differentiating (15) with 
respect to p, equating to 0, and solving for p, that is 

fi7^ = 1 _ /' + /(! - /)0 -py + pfi(LzIKLzJ>V 

^ ' dp ^ [/+(! -/)■(! -p)-P ■ • 

Simplifying, and using the designation pi for the maximizing value of p, gives 


a + Dpi - 1 = -y- (1 - Pi)‘^‘, or 


(18) 


(1 - Pi)‘ = 


f[{i + l)Pi - 1] 

(1 - /)(! - Pl) • 


Substituting in (16) this value of (1 — pi)‘, we have 


(19) 


^ _ (i + Dpi - 1 

Pt -• 


hence 


( 20 ) 


pl 


1 4- ipt 

t + 1 ■ 


From (18) and (19), we have 


( 21 ) 

( 22 ) 


Pl = ^.-^(1 — pO'^S hence 

n 

f= (1 - 

ipb + (1 - pi)‘+‘‘ 


The curves given in Fig. 3 were calculated by choosing values of i for given 
values of AOQL (pt) and calculating pi from equation (20) and / from equation 
(22). Thus for a given AOQL value, an i value may be found for a chosen / 
value and vice versa. It will be noted that for a given value of /, i varies in- 
versely with the AOQL value, to a close degree of approximation. 


10. Operating characteristics of the plan. Figs. 4(a) and 4(b) give a picture 
of the operating characteristics of the general plan as/ and i are varied. They 
indicate for example that for a moderate range of / values the factor i has a 
stronger influence than / in determining the discrimination that the method 
affords between high and low levels of incoming per cent defective. For the 
values of / and i shown, Fig. 4(b) indicates just what level of incoming per cent 
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I, NUMBER OF UNITS 

Fig. 3. Curves for determining values of / and i for a given value of AOQL 
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defective would force a correction of the manufacturing process, if the percentage 
of total production that would be accepted on a sampling basis falls below a 
critical value — often, a value of the order of 80% to 90%. 

Fig. 5 gives a comparison of the characteristics of several plans having the 
same AOQL value, 1%. It indiates for example that when the normal level 
of incoming per cent defective is well below the AOQL, the AOQL value can be 
assured with less inspection by choosing / small and i large. But since, for a 
given AOQL value, the average amount of inspection approaches a minimum 
as / approaches 0, factors other than the minimum amount of inspection have a 


(«) PER CENT OF PRODUCT UNITS Ch) PER CENT OF TOTAL PRODUCTION 

ACCEPTED WITHOUT INSPECTION ACCEPTED ON A SAMPLING BASIS 



INCOMING PER CENT DEFECTIVE 

Fig. 4. Curves showing effect of / and i on operating characteristics of plan 


PER CENT OF PRODUCT UNITS 
ACCEPTED WITHOUT INSPECTION 



Fig. 6. Characteristics of three plans having the same AOQL of one per cent 

more important influence on the choice of the most advantageous combination 
of / and i values for a given set of circumstances. For example, when the 
inspector is located at the end of the production line, it may be desirable to use 
a value of i not greater than [some small multiple of the number of product units 
on the line at any one time. Or again, the value of / is often influenced by the 
normal work loads of the inspector and the operators on the line. Protection 
against ‘‘spotty’^ quality, such as may arise from temporary irregularities in 
workmanship or materials, should receive special consideration in connection 
with the choice of /. ^ 
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11. Protection against spotty quality. The -pt scale at the right of Fig. 3 pro- 
vides a guide concerning the protection afforded against spotty quality in a 
continuous run of product. The value of is the per cent defective in a run of 
1000 consecutive product units, for which the probability of acceptance by sam- 
ple is 0.10 for a percentage sample equal to the corresponding/ value shown on 
the chart. 

This scale indicates that the protection against spotty quality falls off very 
rapidly with / and that the protection, considering runs of product of 1000 
consecutive units each, becomes quite poor if / is less than 2%. 

12. Effect of selecting group samples rather than one unit at a time. The 

above development assumes selection of individual sample imits one at a time 
from the flow of product and immediate examination of a unit to determine 
whether or not it is defective. Deviations from this procedure will in general 
result in giving values of AOQL higher than those shown in Pig. 3. 

For example, the actual AOQL may be higher than the theoretical value (a) 
if the inspector delays looking at the individual units immediately when they 
are withdrawn from the line, or (b) if he selects a group of units at one time 
from the production line. The effect of either of these two deviations, both 
constituting a delay, may be quite large if i is small, or if large group samples 
are taken. 

Although the modification of the theoretical AOQL value resulting from the 
selection of group samples has not been thoroughly explored, this should not be 
excessive, 

(a) if group samples of n = 10 or less are drawn from the line, and 

(b) if t = 50 or more, 

provided there is no delay in examining the group samples drawn from the line. 

It should be noted however, that the effect of these delay factors on the AOQL 
may be compensated for in part if, when a defect is found, the 100% inspection 
includes some of the units that have already passed the inspection point. 

Where appreciable delays are unavoidable, an alternative is to withhold from 
acceptance a stipulated number of units pending the examination of the sample 
imits that have been selected to represent this quantity of product. Such a 
procedure provides in effect a lot acceptance plan, the treatment for which is 
covered in Part III. 

13. Administration of inspection operations. The inspection plan is most 
effective in practice if it is administered in such a way as to provide an incentive 
to clear up causes of trouble promptly. Such an incentive may be had by im- 
posing a penalty on the operating or manufacturing department when defects 
are encountered. Normally, no such penalty is imposed if both the sampling 
inspection and the 100% inspection are performed by the same person or group 
of persons and the two costs merged: the inspector then merely serves as an 
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agency for screening defects when quality goes bad. It is accordingly recom- 
mended that the sampling inspection and the 100% inspection operations be 
treated as two separate functions. 

With this in mind, the inspection work can be performed by two different 
inspectors, designated inspector C and inspector M. Inspector C may be 
considered as the consumer's representative in that his work is performed as a 
function independent of the manufacturing group. The term ‘‘consumer^' 
is used in the general sense of the recipient of the product after the inspection 
has been completed. Inspector M is responsible to the Manufacturing Depart- 
ment and the cost of his work is borne by that Department. His work must 
however be subject to the surveillance and approval of inspector C. 

The following method of administering the inspection plan can then be used: 

(a) Inspector C inspects the required fraction f. So long as no defects are 
found, product is considered acceptable and is passed. 

(b) When inspector C finds a defect, he 

1. continues inspecting the fraction /, 

2. places some identification on the succeeding flow of product to indicate 
nonacceptance (or diverts it from the regular production line if the 
design of the line permits), such designation to apply until clearance 
is obtained in accordance with paragraph (c), and 

3. calls inspector M to inspect the succeeding flow of product in accord- 
ance with paragraph (c). 

(c) Inspector M (one or more inspectors as needed) inspects all succeeding 
units, except those inspected by inspector C in the fraction /, until the 
required number of units, i, are found clear of defects. Inspector M 
reports immediately to Inspector C all defects found in the course of his 
100% inspection and notifies him when a run of i units has been found 
clear of defects. 

(d) When notified that a run of i units has been found clear of defects, in- 
spector C, if satisfied with the work of inspector M, releases inspector M. 

(e) To facilitate speedy correction of causes of trouble, inspector C, on finding 
a defect, should promptly notify the production foreman or other desig- 
nated authority and furnish the latter with detailed information regarding 
the character of the defect found. 

It will be noted that the above procedure requires calling inspector M whenever 
inspector C finds a defect. To avoid taking such action on the occurrence of 
a single defect, the procedure can be modified so that inspector M is called into 
the picture only when two defects in succession are observed by inspector C. 
Where this feature is desired, paragraph (b) above may be modified to read 
as follows: 

(b) When inspector C finds a defect, he 

1. proceeds immediately to inspect all succeeding units up to a total of 
i units, and if no defects are fouad therein, he again limits his inspection 
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to the fraction/. If, on the other hand, during the course of inspecting 
the next t units, inspector C finds a second defect, he immediately 
discontinues his 100% inspection, 

2. places some identification on the succeeding flow of product . . . etc. 
While this procedure carries the disadvantage of placing a varying work load 
on inspector C, it is often preferred since a single defect tends to be regarded as 
an isolated occurrence whemas two defects in quick succession, (like a first and 
second offense) are normally accepted as sufficient evidence to justify special 
action. 

14. Inspection for several kinds of defects simultaneously. The procedure 
given above may be applied directly to an inspection covering two or more kinds 
of defects, provided that the chosen AOQL value applies to all defects collectively 
and each unit inspected is always inspected for all of the defects under considera- 
tion. 

It is sometimes desired, however, when a defect of one kind is observed, to 
confine the 100% inspection to this one kind of defect alone. This requires a 
modification of the general procedure and the establishment of a separate AOQL 
for each kind of defect. A similar modification is required for example where 
the inspection is to cover several kinds of defects, but where the defects are 
grouped into two or more classes, according to their seriousness, and the defects 
in each class treated collectively. 

The following paragraphs outline for illustrative purposes a procedure for 
use where the defects under consideration are to be classified into two groups, 
•Major and Minor, and where all Major defects are to be treated collectively 
and all Minor defects likewise. By analogy, the procedure to be followed when 
each kind of defect is to be treated separately will be obvious. In any event, 
the fraction / is made the same for all classes or all kinds of defects. 

Procedure 

Several kinds of defects are grouped into two classes with respect to serious- 
ness; designated Major and Minor. 

All defects of the same class (Major or Minor) are treated collectively. 
Preliminary 

(1) Establish an overall AOQL value for Major defects and an overall AOQL 
value for Minor defects. Select a suitable value for/, applicable to both 
Major and Minor defects. From Fig. 3 determine a value of i for Major 
defects, designated iA , and a value of i for Minor defects, designated ia . 

(2) At the outset, inspect 100% of the units consecutively for both Major 
and Minor defects until units in succession are foimd clear of defects 
(^Mw =* iji or ia , whichever is the larger). 

Routine 

(3) When i^^ units in succession are found clear of defects, discontinue 100% 
inspection and inspect only a option / of the units for hoik Major and 
Minor defects^ selecting individwl sample units one at a time from the 
flow of product. 
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(4) If a Major (or Minor) defect is observed during sampling inspection, 
inspect 100% of the succeeding units only for defects of the class in qitestion 
imtil Ia (or ib) units in succession are found clear of defects of this class. 

(4.1) During the 100% inspection referred to in (4) inspect a portion f 
for both Major and Minor defects, 

(4.2) If during the 100% inspection for a particular class of defect (Major 
or Minor), a defect of the other class is observed on an individual 
unit of product, start 100% inspection for defects of the new class 
only if the new defect is observed on one of the / units that has been 
inspected for both Major and Minor defects, and continue such 
100% inspection for defects of the new class until i (as determined 
in (1) for the new class) units in succession are found clear of defects 
of the new class. Do not take such action, however, if the new 
defect happens to be observed on one of the non-/ units. 

(5) When the proper number of successive units are found clear of defects 
as in paragraph (4) or (4.2), reinstate sampling inspection as in para- 
graph (3). 

From the above it may be appreciated that difficulties of administration are 
introduced in treating a large number of classes of defects or a large number of 
individual defects separately. How best to group defects together for collective 
treatment can generally be determined from the nature of the inspection opera- 
tions, whether visual or gauging, and the expectancy of defects as determined 
from the quality history. Items involving visual inspection, can often be treated 
collectively to advantage. 

As is generally true, the layout of an inspection plan depends to a considerable 
extent on the nature of inspection operations to be performed. Simplicity of 
administration is always to be desired. From the standpoint of minimizing 
overall inspection costs, it is often preferable, where several quality character- 
istics are to be inspected, to break down the inspection work into two or more 
separate inspection steps, each covering a relatively small number of char- 
acteristics. 

III. Inspection of a Flow op Individual Lots or Sub-lots 

16. Purposes of Inspection. A manufacturer’s inspection of his own product 
serves two purposes®: 

(a) Process Control — To provide a basis for action with regard to the pro- 

duction process with a view to better future product. 

(b) Product Acceptance — To provide a basis for action with regard to the 

product already at hand. 

The plan outlined in Part II has both of these purposes in mind, but the provi- 
sion for selecting sample units continuously from the production line places 
special emphasis on control. It aids, for example, in the prompt detection of 
defects and location of causes of trouble^ the manufacturing process. 

• See A. S. A. War Standard, Z1 . 3, Control Chatt Method of Controlling Quality During 
Production^ pp. 5-6, 1942, American Standards Association, New York. 
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The problem of acceptance of product is often eased, though at some sacrifice 
to the control aspects of the inspection work, if product is submitted to the 
inspector in lots or sub-lots and a sample taken from each. 

16. Inspection procedure for sub-lots. With minor modifications, the plan 
and procedure of Part II can be extended to the case where material is offered 
as di flow of consecutive sub-lots of articles. In the inspection of parts, for example, 
the material may be offered in pan-loads or trays, each containing a collection 
of parts produced under essentially the same conditions. Or again, the product 
from a common source for a given short period of time, such as a half-hour, 
one hour, etc., may often be treated as a sub-lot and offered to the inspector as 
such for his acceptance. In what follows, however, it is essential that such 
sub-lots be kept in the order of their production. 

The theoretical development given in Part II makes use of random-order 
spacing of defects in a statistically controlled product, with the specific provision 
that the units inspected be selected in the order of their production. In applying 
the general plan to the inspection of a flow of consecutive sub-lots, we no longer 
have individual units available in the order of their production. But wc can 
use the same theoretical framework if wc consider the random spacing of defects 
as their spacing in the chain of inspected units arranged in the order of their 
inspection. The probability distribution of the spacing of defects in inspected 
units will be the same regardless of the manner of selecting the units to be 
inspected, so long as we hold to the concept of statistical control in our solution. 

The ‘‘i units in succession to be found clear of defects,^^ discussed in Part II 
•will now be defined as i consecutively inspected units. During sampling inspec- 
tion, a group sample of units will be selected from each sub-lot, and the fraction 
/ will relate to the ratio of the number of units in the sample to the total number 
of units in the sub-lot. The fraction / will be held constant for all sub-lots. 
Furthermore, when it is required under the general plan to find i inspected units 
in succession clear of defects, the 100% inspection must be allowed to extend 
into immediately succeeding sub-lots if i units in succession are not found clear 
in the current sub-lot. 

17. Procedure B. The procedure is as follows : 

(a) At the outset, start inspecting 100% of the units in a sub-lot and continue 
such inspection until i inspected units in succession are found clear of 
defects. Extend the 100% inspection^ if necessary, into one or more 
succeeding sub-lots in the order of their production. 

(b) When i inspected units in succession are found clear of defects, discontinue 
100% inspection and inspect only a fraction / of the units from each of 
the sub-lots, selecting the sample units in such a way as to fairly represent 
the sub-lot. 

(c) If a sample unit is found de^tive, start a 100% inspection of the re- 
mainder of the sub-lot, and cOTtinue the 100% inspection until again i 
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inspected units in succession are found clear of defects, as in paragraph 
(a), extending such inspection into succeeding sub-lots, if necessary. 

(d) In the event the 100% inspection extends into one or more succeeding 
sub-lots, if the number of units inspected in the last of such succeeding 
sub-lots exceeds a fraction / of the number of units in the sub-lot, accept 
this last sub-lot without further inspection. If on the other hand, the 
number of units inspected in this last sub-lot is less than the fraction /, 
inspect additional units from this same sub-lot to make up a sample equal 
to a fraction/ of the number of imits in the sub-lot. 

(e) Correct or replace with good units all defective units found. 

As was the case in Part II, the inspection plan is defined by two constants, 
/ and t, and the protection oflfered is expressed in terms of AOQL. This sub-lot 
inspection plan differs from those already published in that the screening action 
is not confined to a single sub-lot but may extend over a succession of sub-lots, 
the entire production being regarded as a train of sub-lots that are linked together 
for purposes of inspection in the order of their production. 

IV Remarks 

It ^\ill have been noted that the plan here outlined should be regarded as a 
‘‘special purpose*^ plan applicable under the conditions which have been enu- 
merated — where production is practically continuous, where inspection is to be 
made during production or immediately thereafter and is to serve not only as 
a screening acceptance agency if necessary, but as an aid to process control by 
disclosing promptly any sub-standard quality conditions in the product. It 
is believed that the general plan provides a structure, which with possible var- 
iations in procedure to serve particular circumstances, may be found useful in 
designing additional sampling inspection techniques. 



ON THE THEORY OF RUNS WITH SOME APPLICATIONS TO 
QUALITY CONTROL' 

By J. Wolfowitz 
Columbia University 

1. Recent developments in the theory of runs. The increasing number and 
importance of recent advances in the theory and statistical applications of runs 
may make a brief paper on the subject of some interest. The large volume of 
material and its wide dispersal, together with the limitations of space, will of 
necessity make these remarks far from exhaustive and complete. 

I shall not define a run because new advances and applications of new criteria 
to new problems would probably soon render most definitions obsolete. Runs 
as used in statistics are best characterized by a philosophy and a technique rather 
than by the employment of any one specific device. What is always involved is 
the ordering of observations according to some characteristic and the resultant 
effect of this ordering on the ordering according to some other characteristic. 
For example, if the seats at a meeting of statisticians and engineers are numbered 
and occupied by m engineers and n statisticians, then if we list the numbers of the 
occupied seats in ascending order and replace each number hy E ox S according 
as the seat is occupied by an engineer or statistician, we shall have a sequence of 
m + n elements, m E^s and n S*s. Thus, if m = 7 and n = 6, such a sequence 
might be 

EEESEESSSESSE. 

If we were interested in knowing how well engineers and statisticians are ac- 
quainted with one another, we should find it of interest to study the runs of E^s 
and S’s in this sequence. Any subsequence of consecutive E^s or S^s which can- 
not be enlarged is called a run. Thus in the example above there is a run of E^s 
of length 3, followed in order by a run of S^s of length 1, a run of E^s of length 2, a 
run of S*8 of length 3, a run of E^s of length 1, a run of aS’s of length 2, and a 
run of E^s of length 1. Runs of this kind are usually called runs of two kinds of 
elements. Naturally the characteristic according to which we order (in the 
example above, seat number) and the characteristic whose runs are observed 
(E or S) may be various. They ought in general to have a meaningful connec- 
tion. 

The order of observations has no value if it is known that the observations are 
independent and random from the same universe and one seeks to estimate a 
parameter of the universe. Many of the statistical problems treated in the 
literature are of this character. In quality control of manufactured articles one 

^ Revised from an expcMsitory address delivered at a joint meeting of the Institute of 
Mathematical Statistics and the Americfm Society of Mechanical Engineers at New York, 
May 29, 1943, at the invitation of the program committee. 
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of the fundamental problems is to decide whether the observations are “random,” 
or in the language employed in this field, whether statistical control exists. For 
this purpose indiscriminate pooling of data which suppresses the order charac- 
teristics of the observations represents a loss of valuable information. 

The algebra of runs of two kinds of elements is fairly elementary and most of 
the distribution problems involved have been solved. Suppose an urn contains 
m white balls and n black balls, thoroughly mixed, and m + n drawings are 

(m 4 “ r)! 


made without replacement. There are 


m\n\ 


different sequences of 


and possible, and each sequence has the same probability. Let us find in 
how many ways the m elements W can be arranged to give k runs. By a trick 
due to Euler, this is the coefficient of in the purely formal expansion of 


(x -f- 4" • • • 4" 


which is the same as the coefficient of x”* in the formal expansion of 

(a: 4- 4- X* 4- •••)*= 

^ ^which is, of course, the combinatorial symbol for 


the number of sequences of W*b and B^b which have 2k 

and hence that the probability that U, the number of runs of both kinds, be 
2A;is 

The details of this and other relevant derivations can be found in Wilks [1], 
Mood [2], Wald and Wolfowitz [3], and Stevens [12]. The formulae given there 
are of the type given above; e.g., for the probability that U = c. Application 
to tests of significance usually requires formulae of the type which give the proba- 
bility that U < e. This causes some difficulty in application and raises a need 
for suitable tables. Useful tables have been given by Swed and Eisenhart [4] 
and by P. S. Olmstead in an article by Hosteller [6]. The latter table really 
deals with a special case of runs of two kinds of elements. 

The devices described above were systematically utiliased by Mood [2] to jpve 
a valuable collection of formulae. A representative result is that the joint dis- 
tribution of the numbers of runs of len^h 1, 2, • • • , p and all those of length 
greater than p is asymptotically normal, with means and covariance matrix 
given. 


and is therefore 

(m - 1)! \ 

(m - k)\ik - 1)!/' 

It is easy to see that 
runs of both kinds is 
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The results given by Mood are limited to a classification of runs into a finite 
number of classes. The author [6] has given a general result which permits 
weighting runs of all lengths. 

Closely allied to runs of two or more kinds of elements are runs from a bino- 
mial or multinomial population. If the observations arc classified into k classes, 
designated by 1, 2, • • • , fc say, and each observation has a constant probability 
p, of falling into the ith class (i = 1, 2, • * * , fc) then a sequence of I observations 
all of which belong to the same class and which is preceded and followed by ob- 
servations which belong to another class (except, of course, when the sequence 
is at the beginning or at the end of the series) is called a run of length L If a 
coin, whether unbiassed or not, is tossed repeatedly, the runs of heads and tails 
ai-e runs from a binomial population (i.e., k = 2) and if the coin is unbiassed. 
Pi = P2 = i 

The algebra of these runs has been studied mainly by von Bortkiewicz [7], 
von Mises [8], Wishart and Hirshfeld [9], Cochran [10], and Mood [2]. Runs 
from a binomial population (say) differ from runs of two kinds of elements in 
that m and n (defined above) are chance variables. If therefore, in general, a 
distribution formula valid for a fixed m and n be multiplied by the probability 

of this particular set of m and n ^ pT p? ^ and summed over m and n, 

the result will be the corresponding distribution formula for runs from a binomial 
population. Von Bortkiewicz [7], Cochran [10] and Mood [2] derived the essen- 
tial parameters involved. Wishart and Hirshfeld [9] proved the asymptotic 
normality of the total number of runs from a binomial population, and these 
results were generalized by Mood [2], 

Von Mises [8] proved that if A" be the number of observations from a binomial 
population, the distribution of the number of runs of a length which is of the 
order of log N approaches the Poisson distribution with increasing V. 

Cochran [10], extending the work of Gold [11], made use of runs of this kind in 
order to study what they called ‘‘the persistence of weather”, i.e., whether dry 
months tend to follow dry months and wet months to follow wet months. In a 
long series of weather observations the months were classified as wet or dry and 
a four-fold table constructed of the number of months falling into each of the 
following categories: 

(a) wet month following a wet month 

(b) wet month following a dry month 

(c) dry month following a wet month 

(d) dry month following a dry month. 

The chi-square test was applied to the four-fold table to test the null hypothesis 
that the probability of whether a month was wet or dry was independent of what 
its predecessor had been. 

Olmstead [13] has made use of a run which is very similar to that of a run from 
a binomial population, except that the sequence terminates whenever an obser- 
vation on a specified one of the two classes (a “failure”) is recorded. The author 
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[6] has used a ran defined as a sequence of consecutive integers in a permutation 
of the first n integers to test whether two variates are independently distributed 
when nothing is known about their distribution functions except that they are 
continuous. The rank correlation coeflftcient is usually employed for this purpose. 

Of great importance in quality control of manufactured output are runs up and 
down. If, in any of the n! equally likely (by hypothesis) permutations of the 
first n integers, we subtract each element from its successor and replace the result 
by + or — according as the difference is positive or negative, we get runs of + 
signs and — signs, called respectively runs up and down. The usage of the term 
length varies; in this paper we shall say that the length of a run is the number of 
+ or — signs in it. This has the advantage that then the sum of the lengths of 
all the runs is n — 1. (Most quality control literature, which follows Shewhart 
[14] and Kermack and McKendrick [15], defines the length of a run as one more 
than the number of + or — signs in it.) Thus, for example, the sequence 

3476512 


will appear as 


after the + and — signs have been inserted, and has an ascending ran of length 
2, followed by a descending run of length 3, followed by an ascending ran of 
length 1. 

The distributions associated with runs up and down in general present mathe- 
matical difficulties greater than those associated with distributions of runs of two 
kinds of elements and the results are far from complete. The asymptotic 
expectation of Vp , the number of runs of length p, was given with great brevity 
by Fisher [16] and in detail by Kermack and McKendrick [15], and the exact 
result was supplied by Wallis and Moore [17]. The matrix of covariances among 
the runs of various lengths is being computed, and, it is hoped, will lie available 
for publication shortly. As far as the author is aware, no explicit formulae 
giving the probability that rp = ifc or that r, < fc are known. Some recursion 
formulae of limited usefulness are available. 

The author has recently obtained the asymptotic distributions of rp , of 
^j>i > > • ' ■ > jointly, and of related statistics. These are jointly normal. 

Hence certain quadratic forms in these variables have approximately the chi- 
square distribution. 

Anticipating somewhat the discussion to be given below, it may be mentioned 
here that the quadratic forms in certain of the rp which Kermack and McKend- 
rick [15] use to test for randomness, do not have the chi-square distribution which 
Kermack and McKendrick imply to them. Wallis and Moore [17] first pointed 
out that these quadratic forms were not the proper chi-square statistics for good- 
ness of fit because of correlation among the tp . The author's recent results 
show that these forms do not have the chi-square distribution. 
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2. Remarks on applications of runs. Let us now turn to statistical applica- 
tions of some of the runs described above. Suppose we have a sample of m 
random independent observations on one variate and a similar sample of n 
observations on another variate. Suppose further that nothing is known a 
priori about the distribution of each except that both are continuous, and it is 
desired to test whether the two distributions are identical. This problem is of 
great practical importance and occurs frequently. In quality control of manu- 
factured output it may occur, for example, if we wish to test whether the output 
of two machines, two workers, two different processes, or that from raw material 
obtained from two different sources, is the same. NaturaUy the problem not 
only of two, but in general, of a larger number of samples may arise. 

The solution proposed in [3] is as follows: Let the m + n observations be 
arranged in order of, say, ascending size, and let each observation be replaced by 
F of S according as it comes from the first or second sample. The total number 
U of runs in both F and S is the statistic to be used. Small values of U are the 
critical values for rejecting the hypothesis of identity of distributions. Thus in 
the example above of the seating of statisticians and engineers in the auditorium, 
a small value of U, which implies that the S (statisticians) and the E (engineers) 
each tend to bunch together, would be regarded as evidence that the statisticians 
and engineers present are not well acquainted with one another. 

The statistic U seems a not unreasonable one for the purpose. A discrepancy 
between the two distribution functions will make alternation of values of the two 
variates less frequent. This idea was proved for large n in [3], where a gener- 
alized concept of statistical consistency is given. 

On the other hand, the choice of 17 as a statistic is arbitrary; other reasonable 
criteria can certainly be given (see, for example, Dixon [19]. In [3] it is shown 
that a criterion which had previously been proposed was not acceptable because 
the statistic was not consistent, but nevertheless consistency is a property en- 
joyed by many statistics and constitutes only a partial check on the arbitrariness 
of choice. An “abnormally” long run in one or both variates which would be 
regarded by “common sense” as an indication that the hypothesis ought to be 
rejected, might be accompanied by a large number of runs of length one which 
might make the value of U not critically low. Some writers suggest that the 
presence of a long run of sufficient length be regarded as indicating rejection of 
the null hypothesis. In that case, if most of the runs were comparatively long, 
while none were critically long, the null hypothesis would not be rejected under 
this criterion, but the value of U would be small. A step has been made in the 
direction of setting-up a criterion for the choice of statistic ([6]) so as to remove 
this arbitrariness. This involves an extension of the likelihood ratio principle. 
It must be remembered, however, that almost any criterion will fail to reject 
some sequence which, it seems intuitively, ought to be rejected. All statistical 
inference involves risks of error; one object of the science of statistics is to mini- 
mize these risks. 

Another possible test for the problem of two samples is to compare the num- 
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here of runs of various lengths with their expected numbers by the proper chi- 
square (Caution: the correlation among the variates must be taken into account), 
llie author [6] has developed another test from an extension of the likelihood 
ratio. 

Whenever a uniformly most powerful test does not exist, and this is the case 
in most non-parametric problems, it is not usually possible to say that one test 
is more powerful than another, unless the set of alternatives is sufficiently de- 
limited. The power function is then the ultimate criterion for the choice of 
statistic. 

If a sequence of n unequal numbers be given, a very important question is to 
decide whether the sequence is a “random” one; if it is and the sequence repre- 
sents measurements on a characteristic of successive products of some manufac- 
turing process, the latter is said to be in statistical control. A precise mathe- 
matical formulation can be given to this statement about randomness. Let 
Xi , Xt, • • • , Xn be chance variables, and let xi , xj , • • • , Xn be a set of random 
observations on the corresponding variables. To test whether Xi , X 2 , • • • , x» 
is a “random” sequence means to test the hypothesis that Xt , Xi , • • • , X„ 
are independently distributed and have identical distribution functions. This is 
in general a difficult problem, chiefly because of the large class of alternatives to 
the null hypothesis. 

Since the null hypothesis does not specify the distribution functions but only 
asserts their identity, the tests most generally sought have been such that their 
size is independent of the unknown (but identical for all the chance variables) 
distribution function. Certain reasonable procedures have been based on the 
numbers and lengths of runs up and down in the sequence. 

R. A. Fisher [16] suggested doing this, but gave no indication as to what 
statistic was to be used. Kermack and McKendrick [15] and Wallis and Moore 
[17] propose the following procedure, the former writers implicitly and the latter 
explicitly: Let 

n— 1 

r'p = E n 

i^p 

and denote by x the expectation of the geqeral chance variable x. The proposed 
statistic is 

g -I- (»•> - f'pf 

fi Vp 

with the critical region the upper tail. Wallis and Moore recommend p = 3 and 
approximate the distribution by empirical methods. As we have seen above, 
Kermack and McKendrick err in ascribing to the statistic the chi-square distri- 
bution. 

The criticism has been made by Olmstead [19] that this statistic is insenritive 
to pronounced trends in the data. This is correct, and had been pointed out 
earlier in [17], where the prior removal of a trend is recommended. Since one of 
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the important problems of quality control is detection of a trend, this would limit 
the usefulness of the statistic for quahty control purposes. 

It happens frequently when a new rank statistic has been proposed for testing 
a non-parametric hypothesis such as that of “randomness” above, that critics of 
the proposed criterion construct sequences which, they say, appealing to “ordi- 
nary common sense”, any reasonable statistic ought to place in the region of 
rejection for almost any size of test. They then cheerfully point to the fact that 
the proposed statistic does not act in this reasonable fashion. A few remarks 
about this may not be amiss. 

A test for, say, “randomness”, which is to be made on the sequence of ranks, is 
really a numbering of the n! permutations of, say, the first n integers, according 
to the order in which they ought to be taken into the critical region in order to 
make the latter of any prescribed size. This numbering could even be done by 
tabulating, for different n, the various sequences in their proper order. Aside 
from the obvious practical obstacles to such a tabulation, there would soon arise 
the difficulty that, after the “obvious” sequences are assigned their places the 
investigator would have difficulty in assigning to most of the remainder an order- 
ing according to the degree in which they may be held to “contradict” the null 
hypothesis. Resort is therefore made to a statistic which can be given as an 
analytic expression in the ranks. Because of the inadequacies of the theory the 
formula is often chosen by analogy with a similar formula in classical statistics. 
Difficulties may arise because of this. 

Let us examine for a moment this intuitive notion of reasonableness. Most 
people, and even most statisticians, would agree that the sequence of the first n 
integers in ascending order is an indication of non-randomness. The basis for 
this notion is an intuitive conception of an alternative to the null hypothesis for 
which this sequence is very probable. The fact is, however, that if we admit all 
alternatives to the hypothesis of randomness, for any sequence of ranks whatever 
there exist infinitely many alternatives which assign to this sequence a probability 
of one. 

It seems to us that the difficulty can be met to a large extent by delimiting the 
class of distributions which constitute the alternatives to the null hypothesis, 
and by assigning to the admissible alteniatives a weight function which measures 
the importance of the various alternatives (e.g., the financial loss caused by each). 
A profound treatment of this subject for the parametric case has been given by 
Wald [20]. This method has also the great merit that it removes the need for a 
choice of size of the region of rejection. 

In the control of the quality of mass production output one of the outstanding 
problems is to decide on the basis of a sequence of observations on the product, 
whether the production process is in statistical control. Shewhart and his 
school of industrial statisticians base many of their tests on the sequence of 
ranks. On the basis of their experience they find that the causes which most 
often lead to a breakdown of statistical control are such as to cause shifts up and 
down in the level of the observations or trends in the observations. To detect 
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the former they have devised the technique of runs above and below the median 
and to detect the latter they use runs up and down. Runs above and below the 
median may be described briefly as follows: The 2m + \ (odd number) of ob- 
servations furnish a sequence of rankings from 1 to 2m + 1. The elements 1 to 
m are considered to be elements of one kind and the elements m -f 2 to 2m + 1 
elements of another kind. We then have a special case of runs of two kinds of 
elements. Limitations of space prevent the presentation of more detail or a 
description of the ingenious scheme by which both kinds of runs are graphically 
exhibited. The reader is referred to [14], [5], and [21], among others. The tests 
used in the industrial applications are not always explicitly stated, nor do they 
always seem to be the same. The most common involve comparison of runs of 
various lengths with their expected mmiber or else are based on the presence of 
abnormally long runs. 

A pretty application of the theory may be found in Campbell [21]. The cor- 
rosion of a copper plate was determined by a delicate mechanism which measured 
the electrical resistance in various places on the plate. The rectangular plate 
was divided by rows and columns into forty small rectangles in each of which a 
measurement was made. The readings were made in each column in successive 
order from one end to the other, and the columns were also measured in succes- 
sive order from one edge to the other. The observations, when examined for 
runs above and below the median and runs up and down, indicated something 
amiss (“absence of statistical controPO- Two causes were considered possible: 

(a) variations, over the plate, in the corrosion of the copper; 

(b) malfunctioning of the delicate measuring apparatus. 

The runs obtained by arranging the observations in successive order according 
to positions on the plate might be expected to be associated with (a), while the 
runs obtained by arranging the observations in temporal order might be expected 
to be associated with (b). The object was therefore to separate the two order- 
ings and this was done as follows: The rectangles were numbered 1 to 40 in the 
order in which the first observations had been made and a random permutation 
of this sequence was used to indicate the order in which the next set of observa- 
tions was to be made. The second set was then ordered in two different ways, 
first according to the temporal order of the observations, and second according 
to the original ordering by positions. The runs above and below the median and 
the runs up and down, in the first ordering of the second set of observations gave 
evidence of a lack of statistical control, while those in the second ordering of the 
same set did not. An investigation located the trouble in the measuring appa- 
ratus. 

3. Conclusion. The manifold achievements of quality control as it is prac- 
ticed at present point to the desirability of still further development of theory 
and practice. We conclude this paper by suggesting a few directions in which 
the theory of runs could develop and be of greater assistance in quality control. 

(1) The kinds of runs and the statistics used for making decisions in a produc- 
tion process should be chosen on the basis of the kind of deviations from the 
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state of statistical control which the engineers consider most likely to occur. 
It is very likely that different production processes may require different sta- 
tistical procedures. 

(2) General distribution theorems should be developed, power functions ob- 
tained, and the correlations between different tests investigated. 

(3) The application of the weight function idea of minimizing financial losses 
should be considered. 

In these developments both engineers and mathematical statisticians vould 
have important and complementary roles. The tempo of progress will depend 
in large part on the cooperation between them. 
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THE ACCURACY OF SAMPLING METHODS IN ECOLOGY 

Bt PaOTi G. Hobl 
Univeraity of Calif omia at Los Angeles 

1. Introduction. For a number of years journals on ecology have contained 
articles on sampling techniques for estimating the distribution of common species 
of plants in various regions. Although much experimental work has been done 
on this problem and although the problem is essentially statistical in nature, no 
theoretical work of any consequence seems to have been attempted. This paper 
considers the question of the relative accuracy of common sampling methods 
from a theoretical point of view by means of geometrical probability and statisti- 
cal distribution theory. 

There are three common methods of sampling used by ecologists. They are 
designated by the names of coverage, abundance, and frequency. For each of 
these methods of sampling, there are two common choices of sampling imit, 
namely, the quadrat and the transect. By the coverage of a species in a region 
is meant the total area covered by the projection on the ground of the crowns of 
the plants of this species. By abundance is meant the total number of plants 
of this species in the region. By frequency is meant the number of sampling 
units in the region in which at least one plant of the species occurs. A quadrat 
is a sampling unit in the form of a square, usually several yards on a side. A 
transect is a sampling unit in the form of a straight line, coverage in this case 
being the length of line covered by the projection of the crowns. 

In this paper it will be assumed that plants possess circular crowns. Further, 
it will be assumed that plant species distribute themselves at random in the 
region to be sampled. This is not necessarily the case, since there is often a 
tendency for plants of a given species to distribute themselves at random or 
otherwise in groups rather than as single plants. However, if sampling units 
are somewhat comparable in size, the relative accuracy of these methods of 
sampling based on a random distribution would be expected to hold fairly well 
for distributions somewhat removed from this ideal situation. Further, by the 
proper choice of sampling imit size, some non-random distributions behave very 
much as though they were random. 

The accuracy of a sampling method may be measured by the variance of the 
estimate of the quantity which is of interest. Here interest will be centered 
on the total coverage of a given species in the region being sampled. Thus, two 
sampling methods will be said to be equally accurate for coverage if they produce 
equal variances for the estimate of total coverage. 

The quadrat unit of sampling will be considered first for the three methods 
of sampling, after which the transect unit will be considered. 
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2. Quadrat coverage* Let the region to be sampled be a square B units on 
a side. Let there be n quadrats, each a square A units on a side, distributed 
at random in the region. Finally, let the total number of plants of the species 
in question in the region be AT, with the distribution of the radius of their crowns 
given by a frequency function /(r) whose explicit form will be specified later. 

First, consider a single plant of radius r and a single quadrat. The problem 
is to determine the variance of a, the area of that part of the plant lying in the 
quadrat. Now the probability that this plant will be found in any particular 
part of the. region is obtained by treating the plant as a circle of radius r which 
is thrown at random in the region and then applying geometrical probability 
to the position of the center of the circle. Thus, considering only those situations 
when the center of the circle lies in the region, the probability that the circle 
will cover an area of at least a > units of the quadrat is given by the ratio 
of the area of the subregion inside the quadrat whose boundary is the locus of 
centers of circles of radius r which have precisely a units of their area inside 
the quadrat, to the area of the region. Probabilities of this type may be treated 
as functions of a. The expressions below for such probabilities follow directly 
from Fig. 1, which displays one comer of the quadrat. 

P,[a < area < ir?] = a > W 

P 2[0 < area < o] = a < 

P,[a = xr*] = {A - 2rf/B\ 

P*[a = 0] = P« . 



Si 


= {A - r)(r - 2) - ^ ydx, 


Now 

( 2 ) 



SAMPLING METHODS IN ECOLOGY 


291 


where y is the ordinate of curve Ci . Likewise 

( 3 ) S* = A(r + 2) - 2* + Jar* + yt 

- JQ 

where y' is the ordinate of curve €% with respect to the primed axes and 2 is 
negative. Using the formula for the area of a segment of a circle, the equation 
of Cl is easily found to be 


(4) a:\/r* — + r* sin ' ? + yy/ r* — w* + sin ‘ - = o, x* + y* > r* 

T T 

xy/r^ — + r* sin“‘ - + yy / + r* sin"' - + 2xy + = 2o, 

(5) r r 

+ 1 / < r" 

where the value of a is given in terms of 2 by 

(6) 2 + r* sin"‘ ? + !^ = o. 

r Z 


The equation of Cj is given by (5) with 2 negative. These equations do not 
permit the solution of y in terms of x; however, they can be thrown into the 
following parametric form with t as parameter: 


(4') 


(50 



1 

2 cos 
Icos 
1 

2 cos 
icos 



r 2a/r^ — ir/2 + cos t — 
L 1 + sin t 

r 2a/r* — ir/2 4- cos t — 
L 1 + sin < 



Since a may be treated as a parameter, equations (4) and (5), and hence (40 
and (50, represent a system of curves Ci and Ci . Unfortunately, equations 
(40 and (50 are not convenient for integration purposes either, but they are 
convenient for numerical work. This system of curves can be approximated 
satisfactorily by means of simpler curves. One set of such approximating 
curves is the following system of circles: 

(7) (x - r)* + (y - r)* = (r - zf, 2 > 0 

(8) (x - Vr* - 2*)* + (y - = (-2 + 2 < 0. 


Although inequalities may be obtained between the approximating and true 
curves, these are of little value for determining the accuracy of essential moments 
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obtained by using these approximating curves; therefore the accuracy of these 
approximating curves will be judged empirically by means of Fig. 2 in which 
the true curves are plotted by means of (4') and (5') for z = .6, .3, 0, — .3, — .6, 
— .9, of r with solid lines and the approximating circles (7) and (8) with broken 
lines. Although the circles appear to fit poorly for relatively large positive 
values of z, this is not serious because these values occur increasingly less often 
than other values of z for a random circle and because the use of these circles 
is confined to the rate of change of area bounded by these curves and the lines 
X = r and y = r. Since the true curves are approaching the circles with de- 
creasing positive 2 , their rate of change of area would not differ much from that 
for the circles even though the circles include larger areas for a given z. In the 
paragraph following (11), further evidence will be presented to show that for the 
computation of the first two moments of a, these curves give a good approxima- 
tion. 


(r::) 


Fia. 2 

For the purpose of obtaining the variance of a, consider the expected value of 
o*. Since the variable a may be thought of as the sum of three variables which 
assume only the values 0, rr^, and 0 < o < irr*, from (1) it follows that 

F(o‘) = (irr*)* ^ oV»(c) <^« + j[ da, 

where /i(o) and /*(o) are the frequency functions for 2 > 0 and 2 < 0 respec- 
tively. Now since 

' /i(o) da, 

a 

P*t0 < area < /*(®) da, 



and 
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it follows from (1), (2), and (3) that 


/i(o) da = -dPi = -4^* = ^f, 


/ 2 (a) da = dP 2 = 4 


^i-L- 


z+V r»_,> 


y' dx' dz. 


Using the approximating curves (7) and (8), these integrals become: 

ydx = r(r - z) - ^ (r - z)*, 

/.-»+V r«-»‘ / _\ 

^ y'dx'={^l-iy-Z + 

Hence, 

/x(a) do = - 2r(l - - ^z]dz, 


fi{a) do = - 2z - 2^1 - ^(^Vr 


:.)]*■ 


Hence, 

£«.•) = irr>f + I jf <.•[-! - 2r(l - -’) - ?»]* 

+ ± £ «‘[a - 2. -2(1 - ?)( 2 v^T. - -^)] *. 

Substituting the value of o from (6), standard integrals pve the following 
values for A: == 1 and k = 2: 

(9) £(„) = "* [(4)’ + . 13 ], 

(10) E{<f) - $‘[(^)’ - 116(f) + .46], 

where certain constants involving x have been evaluated to two decimals. 

If circles with centers outside the region but overlapping the region were 
also measured, then geometrical probability would give the following value 
for E(fl): 

r,/ X irr* YAV 


A - 2r{ 


B*\rJ * 
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Since in (1) only circles with centers inside the region are assumed measured, 
E(a) will be only very slightly larger than this last value; consequently the 
approximate result in (9) is only slightly in error. For a quadrat ten yards on 
a side and plants averaging two yards in diameter, the error is in the neighbor- 
hood of one tenth of one percent; consequently formula (10) may be expected 
to be quite accurate as well. Another approximating system of curves lying 
largely on the opposite side of the true curves from the circles gave formula 
(10) with .46 replaced by .26, both of which have a negligible effect on E{a^) 
for ordinary applications. 

Formula (10) was derived on the assumption that the same circle was thrown 
repeatedly at random in the region. Consider now the situation when the 
circle varies in size according to the frequency function /(r). Treating a and r 
as two statistical variables, their joint frequency function may be expressed as: 

/(a, r) = /(r)/(o | r), 

where/(a | r) is the frequency function of a when r has the fixed value r. Letting 
represent the expected value of a* when r is permitted to vary according 
to/(r), 

g(a*) ^ j j r) da dr 

= / /W f I r) da dr 
= / /(r)F(a*) dr, 

where all integrals are taken over the regions for which a and r are defined. 
Consequently, from (10) and (11) 

&{a^) = — 1.15Ax» -I- .46v*J, 

and 

(12) &{a) = , 

where the v’a represent moments of r. Hence the variance of a is given by: 

(13) cl = - l.lSAv* + .46:'. - AM/B% 

Finally, let there be n quadrats, N circles whose radii vary according to /(r), 
and let the total area of quadrat covered by the N circles be denoted by s. 
Then 


(14) 


€(«) = nN$ia), 
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and 


(15) = nJVdo 

The purpose of measuring s is to use it to obtain an estimate of T, the[total Wea 
of the N circles. But 

(16) T = = NvPi . 

Substituting the value of vi from (12) and using (14), 

T = B%is)/nA\ 


Hence an estimate of T will be given by 

(17) Ti = B^s/nA^ 

Using (15) and (13), the variance of this estimate will be given by 


(18) 


2 tB^NT 


U45 + 


Ad 

Mj-,- 



3. Quadrat abundance. In this method the sampler merely counts the num- 
ber of plants of the given species in each quadrat. Although this method was 
designed to estimate the total number of plants, it may be adapted to estimate 
total coverage as well. Since it is the practice to count a plant as l3ung in the 
quadrat only if its stem is in the quadrat, the probability that this event will occur 
is given by: 

(19) P, = AVP*. 

Since there are n quadrats and N circles, the number of circles with centers 
lying in quadrats, which will be denoted by s, will follow the binomial distribu- 
tion; hence 

(20) S(«) = nAP,, 
and 

(21) = nNP^il - Pg). 

From (16) and (20) it follows that 

T — irvt&{9)/nPq , 

Therefore an estimate of T will be given by 

( 22 ) Tt = T^mts/nA^, 

where is a sample estimate of obtained by measuring the diameters of k 
plants and calculating their mean area. Since mt and s are independent, a 
standard formula for the variance of a product of two independent variables 
may be apiplied to give 

t^(*w»)<rj + S*(«)<r»,]. 
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But 


t _ Vi — Vi 

and 


€(m2) = 



I 2 

+ V2^ 


Consequently, with the aid of (19), (20), and (21) 

(23, . ,m8‘-A’' 


2 

<r r, 


./iTB*- A’r . j. K, - .n 

■ = ' li L”’ + + T 


4. Quadrat frequency. In this method the sampler records the number of 
quadrats observed and the number of those quadrats which contained at least 
one plant of the given species. Given N plants, the probability p that at least 
one of them will be found in a given quadrat is given by 

p = 1 - (1 - p,r, 

where P, is given in (19). For n quadrats the expected number of quadrats 
in which at least one plant will be found is therefore np. Letting w represent 
the number of such quadrats, 

€(«>) = n[l-(l-P,)'^]. 

Solving for N, 

N = log log [1 - P,]. 

Consequently, from (16) an estimate of T will be given by 
Tz = rm log^l “ log [1 - FJ. 

Neither the mean nor the variance of P* will exist because Tz is a discrete variable 
which becomes infinite for w = n. Unless the density of the species is very low, 
values of w near n will occur quite often and hence cause Tz to vary widely. 
Consequently the frequency method will be inferior to the abundance method 
except when the mean density is low, in which case the abundance method is 
practically as easy to apply. Because the frequency method is obviously inferior 
to the abundance method, it will not be considered further here. 


6. Transect coverage. In this type of sampling a line is laid down and the 
length of line covered by a plant of the species in question is recorded. Let 
there be n such lines,- each L units in length. 

If a circle of radius r is thrown at random in the region, it will cross a line 
only if its center lies within the subr^on, indicated in Fig. 3, composed of a 
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rectangle of width 2r and length L with semi-circular ends. From this figure it 
is clear that the probability of the circle intersecting some positive length less 
than z of the line is given by four times the shaded area 83 , divided by the area 
of the region. From this same diagram the following equations of the indicated 
curves result: 


C,: (x - i)’ + »’ - r>, 

2 < * < 2 

Ci'. y = ■y/r* — z^l4i, 


C,: - ^ + 2 Y -1- j/* = 




Applying geometrical probability, 

Pt [0 < intercepted length < ~ ^ ^ ~ j[ 

where /(«) is the frequency function for 2. But 

S. = i [r - Vr* - *V4l + I Vr* - z'/i + ^ 

Standard integrals give 

Ps = ^ [r - Vf* - 274] + i2\/ r» - z*/4 + ^ sin"* . 
Consequently, 

4 f8r« + Lz- Zz*' 
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From tliis relation the following moments are readily obtained: 

E{z) ^ 

E(z^) = WLr* - irr*]/B\ 

For variable r these formulas become: 

6(Z) = TV2L/B\ 

S(2*) = - iri/J/S*, 

trl = [^Lpt — irvi]/B‘ — ic^vXIj/E^. 

Let ^ denote total z for N circles and n quadrats, then 

(24) 6(f) = nNvvtL/B\ 
and 

<r* = nAr{[\f-L»'j — fl-j'd/S* — v^pIL^/B^]. 

From (16) and (24) 

T = B%(0/nL. 

Hence an estimate of T will be given by 

(25) T, = ^^/nL, 
and its variance will be given by 

(26) ^ [^Lp3 — TTJ-J — IT* v®! . 

. 6. Transect abtmdance. Since the probability, P« , of a circle of radius r 
intersecting a line of length L is the area of the band with semi-circular ends 
indicated in Fig. 3, divided by the area of the region, 

Pt = [2rL -I- in^]/B\ 

Hence, letting s represent the total number of intersections, as in the case of 
quadrat abundance, 

E{a) = nNPt , 

E(/) = nNPtiX - Pi) + n®iV®P? , 

(27) g(g) = nN[2LPi TPt]/^, 

§(«®) = ^ {P*l2L«'i -f- TPi] •+• [nN — t][4L®j<2 + ivLpz -H 
For simplicity of formula if niV — 1 is replaced by nN, the variance of s becomes 
V* = jP®[2L«'i -f- iTPs] 


( 28 ) 


-h nJNri4L®(j'j — yj) -f 4irL(»j — PiPt) + t\p 3 — pj)]}. 
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From (16) and (27) 

j, _ irviB^t^js) 

ti[2I/i'i + Trvi\ ’ 

Hence an estimate of T will be given by 

‘ n[x + 2L«]’ 

where a is an estimate of vi/vi . In order to obtain a satisfactory estimate of 
vi/vi , data for the distribution of diameters of common California shrubs were 
analyzed. It was found that Pearson’s type three curve gave an excellent fit. 
Since the moments of this type distribution are given by 

m— 1 

(29) = p" n [1 + jV% 

J-1 


where p is the mean and V is the coefficient of variation, a/p, then vi/v 2 = 1/p^, 
where ^ = 1 + and the above estimate becomes 


(30) 


T 2L 1 1 

n ^ Tpd -f- 2L 1 ipj * 


where ^ = T^[f — p]/[Trp^ + 2L] and where 1/f is chosen as an estimate of 
1/p. Since f will be approximately normally distributed for samples consider- 
ably smaller than those usually taken to find f, assume that it is normally distri- 
buted with mean zero and variance <T\^B^/k[wpB + 2Lf, Since L is large relative 
to <T and since k will usually exceed twenty-five, this variance is very small, 
and hence the probability of ^ exceeding one numerically is extremely small. 
Although the value ^ — 1 is theoretically possible on the normality assumption, 

such a value would not permit the existence of either the mean or variance of 
1/[1 + ip]. However, if ip is restricted to a range of, say, ten standard deviations 
about zero, then | ^ | < 1 for ordinary conditions and the variance will exist. 
Further, because ip assumes such small values, with this finite range the variance 
of 1/[1 + is the same as the variance of ip itself if higher powers in this variance 
are neglected. Since a and ip are independent, the same product formula that 
was used for quadrat abundance may be employed here, together with the 
various approximations indicated above, to yield 


2 

(T Tf 


J®! 

nN 


{2Lvi + wt) 


\2L + irpdj \l 

+ 4L*(i'2 — vl) + 4tL(('( — vivi) + ir^(»'4 — »' 2 )J 

in- 

L ^H2L + Tp9rj 


+ 




h{2L + icpBY} ki2L + rpOy 


[2Li'i + ir»^] 


'}• 


( 31 ) 
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7. Comparison of methods. Formulas (18) and (23) may be compared for 
relative accuracy of these two quadrat methods of measuring coverage. For- 
mulas (26) and (31) may be compared for relative accuracy of these two transect 
methods of measuring coverage. Finally, formulas (18) and (26), and formulas 
(23) and (31), may be compared to determine what length transect will give the 
same accuracy as a quadrat of given size. All such comparisons will necessarily 
have to be done numerically by considering typical values for the parameters 
involved. The moments occurring in these formulas are expressible by means 
of (29) in terms of p and V if the form of /(r) is that assumed here. For the 
data analyzed to determine /(r) it was found that V was approximately 1/3. 
These numerical comparisons will not be made here. 

The question of which type of sampling method should be employed now 
becomes a question of balancing relative ease or cost of sampling against size 
samples needed to produce equivalent accuracy as determined by means of 
these formulas. If total frequency is desired rather than total coverage, these 
formulas may be altered to handle this situation as well. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary oj the Institute news items of general interest 

Personal Items 

Dr. Charles C. Wagner has been named Assistant Dean of the School of Liberal 
Arts at the Pennsylvania State College. 

Miss Ruth E. Jolliffe has taken a position in the Graphic Analysis Department 
of Bell Aircraft Corporation. 

Mr. H. F. Hebley has been appointed Director of Research for the Pittsburgh 
Coal Company. 

Lt. F. W. Dresch, USNR, U. S. Naval Proving Ground, Dahlgren, Virginia, 
has been promoted to the rank of Lieutenant Commander. 

Mr. George F. Mayer is a Sergeant in the United States Army and is stationed 
at Fort Lewis, Washington. 

Captain A. C. Cohen, Jr. of Picatinny Arsenal has been promoted to the rank 
of Major. 


New Members 

The following persons have been elected to membership in the Institute: 

Bassford, Horace R. B.A. (Trinity Coll.) Actuary, Metropolitan Life Insurance Co., 1 
Madison Ave., New York, N. Y. 

Benson, Kathryn E. M.S. (Washington) Teaching Asst., Univ. of Calif., Berkeley, Calif. 

Biackadar, Walter L. B.A. (McMastcr) Asso. Actuary, Equitable Life Assurance Society, 
393 Seventh Ave., New York, N. Y. 

Buros, Asso. Prof. Oscar K. M.A. (Columbia) Rutgers Univ. (on leave). Captain, Signal 
Corps, A. U, S. SOI S. Courthouse Rd.y Arlington^ Va. 

Clinedinst, William O. M.E. (Carnegie Inst. Tech.) Eng., National Tube Co., Frick 
Bldg., Pittsburgh, Pa. 

Curry, Prof. Haskell B. Ph.D. (Gottingen) Penna. State Coll., State College, Pa., 6708 
N. Sixth St.f Philadelphia, Pa, 

Dix, Margaret J. M.A. (Rice Institute) Sec., Univ. of Calif., Berkeley, Calif. 

Groth, Alton O. M.S. (Iowa) Asst. Actuary, Equitable Life Insurance Co. of Iowa, 
Des Moines, Iowa. 

Gurland, John M.A. (Toronto) Instr., Univ. of Toronto, Toronto, Canada. 97 Metcalfe 
St,, Ottawa. 

Humm, Doncaster G. Ph.D. (Southern California) 1203 Commercial Exchange Bldg., 
416 W. Eighth St., Los Angeles, Calif. 

Jahn, Fred S. M.S. (Florida) General Manager, New Plastic Corp., 1017 N. Sycamore, 
Hollywood, Calif. 

Jeming, Joseph M.A. (Columbia) Captain, Army Air Corps. 3010 Valentine Ave., New 
York, N,Y, 

Kavanagh, Arthur J. Ph.B. (Yale) Physicist, Spencer Lens Co., Buffalo, N. Y. 19 Doat 

Si. 

Kennedy, Evel3m M. M.A. (Cincinnati) Industrial Economist, War Production Board, 
Washington, D. C. 1463 Fairmont St,, NW. 

Lehmann, Brie L. M.A. (California) Asso., Univ. of Calif., Berkeley, Calif. 3614 Pied- 
mont Ave. 
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Lew, Edward A. M.A. (Columbia) Asst. Actuary, Metropolitan Life Insurance Co., 1 
Madison Ave., New York, N. Y. 

Murphy, Ray D. A.B. (Harvard) Vice Pres, and Actuary, Equitable Life Assurance 
Society, New York, N. Y. ItS Godfrey Rd.y Upper Montclair ^ N. J. 

Myers, James E. A.B. (Michigan) Leader-Statistical Analysis Group, Naval Res. Lab., 
Washington, D. C. SOI 4 Nichols Are., SE, 

O’Connor, Harry W. M.B.A. (Harvard) Stat., Sperry Gyroscope Co. Inc., Brooklyn, 
N. Y. S7 Meadow Woods Rd.y Great Neck, 

Painter, Frank M., Jr. M.B.A. (Harvard) Statistics Supervisor, Sperry Gyroscope Co., 
Brooklyn, N. Y. 343 82nd St. 

Salkind, William M.B.A. (Chicago) Asso. Stat., U. S. Dept, of Agric., Washington, D. C. 
2149 KSt.yNW, 

Simon, Leon G. Pension Consultant. 225 W, 34 St,y New Yorkj N. Y. 

Stewart, Oscar F. Statistics Supervisor, Sperry Gyroscope Co., Brooklyn, N. Y. 

Tucker, Led3rard R. B.S. (Colorado) Res. Asso., Univ. of Chicago, Chicago, 111. 5456 
Greenwood Ave. 

Ullman, Joseph L. B.A. (Buffalo) Teaching Fellow, Mass. Inst, of Tech., Cambridge, 
Mass. 397 Jefferson At»e., Buffaloy N. Y. 


REPORT ON THE WASHINGTON MEETING OF THE INSTITUTE 

The fifteenth meeting of the Institute of Mathematical Statistics was held at 
George Washington University, June 17-19, 1943. About 200 persons including 
the following sixty-one members of the Institute attended one or more of the 
three evening sessions: 

T. W. Anderson, Jorge Arias, R. O. Been, H. R. Beilinson, B. M. Bennett, Richard 
iperger, Joseph Berkson, Felix Bernstein, Archie Blake, Dorothy S. Brady, W. G. Cochran, 
J. B. Coleman, Gertrude Cox, J. H. Curtiss, G. B. Dantzig, BesseB. Day, Robert Dorfman, 
H. F. Dorn, W. F. Elkin, W. D. Evans, R. H. Fadner, L. R. Frankel, M. A. Girshick, Harry 
H. Goode, C. H. Graves, T. N. Greville, F. E. Grubbs, Louis Guttman, Morris H. Hansen, 
W. A. Hendricks, W. N. Hurwitz, Walter Jacobs, Rachel M. Jenss, A. J. King, G. B. King, 
Lila F. Knudsen, H. S. Konijn, Solomon Kullback, H. G. Landau, J. E. Lieberman, W. G. 
Madow, Sophie Marcuse, J.^W. Mauchly, A. M. Mood, Harold Nisselson, Monroe L. Norden, 
H. W. Norton, A. C. Rosander, David Rosenblatt, P. J. Rulon, Marion Sandoniire, W. A. 
Shelton, Harry Shulman, J. H. Smith, G. W. Snedecor, F. F. Stephan, Alice Sternberg, 
Benjamin Tepping, J. W. Tukey, C. R. M. Tuttle, F. M. Weida. 

The following program, arranged by Dr. W. G. Madow, was held: 

THURSDAY, JUNE 17 AT 8:00 P.M. 

APPLICATIONS OF SAMPLING THEORY 
Chairmany Professor Frank M. Weida, George Washington University 

1. Some Recent Developments in the Application of Sampling Theory in Agriculture 
Arnold J. King, Iowa State College and Department of Agriculture; Walter A. 
Hendricks, North Carolina State College and Department of Agriculture 

2. The Relative Efficiency of Block Samples in Housing Surveys 
Lester R. Frankel and William J. Cobb, Bureau of the Census 

3. The Optimum Size of Sampling Units 

Dorothy Cruden and Alice Sternberg, Bureau of the Census 
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FRIDAY, JUNE 18 AT 8:00 P.M. 

RECENT DEVELOPMENTS IN STATISTICAL THEORY 
Chairman, Professor George W. Snedecor, Iowa State College 

1. On Some Recent Developments in Sampling Theory 

Morris H. Hansen, William N. Hurwitz, and William G. Madow, Bureau of the 
Census and Office of Price Administration 

2. On ihe Variance of Estimates Arising from Stratified Samples 
Frederick F. Stephan, War Manpower Commission 

3. Statistical Techniques for the Comparison of Different Scales of Measurement 
William G. Cochran, Iowa State College 

4. Adjustments for Differential Refusal Rates in Samples of Human Populations 
Jerome Cornfield, Bureau of Labor Statistics 

5. On the Verification of Weather Forecasts 
Horace W. Norton, Weather Bureau 

SATURDAY, JUNE 19 AT 8:00 P.M. 

SOME PROBLEMS IN STATISTICS 
Chairman, Colonel Leslie p]. Simon, War Department 

1. The Application of Statistical Methods in Acceptance Inspection 
Harold Beilinson, War Department 

2. The Distribution of the Radial Standard Deviation 
Captain P'rank E. Grubbs, War Department 

3. Some Results in Tests of Randomness 

M. A. Girshick, Department of Agriculture 

4. Corrections for Groupings 

John H. Smith, Bureau of Labor Statistics 

5. On Group Blood Testing 

Robert E. Dorfman, Office of Price Administration 

Edwin G. Olds, 
Secretary 


REPORT ON THE FIRST MEETING OF THE PITTSBURGH 
CHAPTER OF THE INSTITUTE 

The first meeting of the Pittsburgh Chapter of the Institute of Mathematical 
Statistics was held at Carnegie Institute of Technology on Saturday, June 19, 
1943. Thirty-six persons attended the meeting, including the following ten 
members of the Institute: 

Shirley Bernstein, M. A. Brumbaugh, Karl Fetters, H. J. Hand, G. P]. Nivcr, ¥. G. 
Norris, E. G. Olds, R. F. Passano, E. M. Schrock, R. W. Shephard. 

Morning and afternoon sessions were devoted to a round-table discussion of 
present industrial uses of statistical methods. Mr. R. F. Passano, Bethlehem 
Steel Co., led the discussion. Mr. F. G. Norris, Wheeling Steel Corp., acted as 
chairman of the sessions. 
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The Pittsburgh Chapter was formed from the Society of Industrial Quality 
Statisticians, which has held meetings at Carnegie Institute of Technology since 
1941, with the object of providing a symposium for those interested in industrial 
applications. The Constitution of the Pittsburgh Chapter was ratified at the 
meeting. The object of the Chapter is to foster the advancement of mathe- 
matical statistics and to promote its application to industrial problems. 

The following officers for the Chapter for 1943 were elected: 

President, F. G. Norris, Wheeling Steel Corp. 

Vice President, K. L. Fetters, Carnegie Institute of Technology 
Sect.-Trecis,, H. J. Hand, National Tube Co. 

Sponsor, E. G. Olds, Carnegie Institute of Technology 
Board Members, a . F. Passano, Bethlehem Steel Co. 

J. Manuele, Westinghouse Electric & Mfg. Co, 

Howard Hand, 

SecYetary of the Pittsburgh Chapter 
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1. Introduction. In most of the problems of statistical inference for which 
we possess solutions the distribution function is assumed to depend in a known 
way on certain parameters. The values of the parameters are unknown, and the 
problems are to make inferences about the unknown parameter values. We 
refer to this as the parametric case. Under it falls all the theory based on nor- 
mality assumptions. 

Only a very small fraction of the extensive literature of mathematical sta- 
tistics is devoted to the non-parametric case, and most of this is of the last 
decade. We may expect this branch to be rapidly explored however: The 
prospects of a theory freed from specific assumptions about the form of the 
population distribution should excite both the theoretician and the practitioner, 
since such a theory might combine elegance of structure with wide applicability. 
The process of development will no doubt inspire some mathematical attacks of 
considerable abstractness. There are already signs that more number-theoretic 
problems and measure-theoretic problems will enter our subject through this 
door, and perhaps even some topological ones. Some ability to think in terms of 

^ Parts of this paper were used in an invited address given under the title ** Statistical 
inference when the form of the distribution function is unknown” before the meeting of the 
Institute of Mathematical Statistics on September 12, 1943 in New Brunswick, N. J. 
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functionals, function spaces, and metrization of function spaces will be useful in 
attempting general theories of ‘‘best’’ tests and estimates. Toward such ab- 
stract phases of the development the attitude of the practical statistician should 
be one of tolerance, for the new theory already promises to give him many new 
tools which are both simpler and of wider use. 

While the maturity of the non-parametric theory is still in the future, it is well 
to remark that its beginnings go relatively far back. Of our most famous tests, 
such as Pearson’s x*-test. Student’s test, and Fisher’s analysis of variance tests, 
the oldest concerns a non-parametric problem: In 1900 Karl Pearson proposed 
his x‘-criterion to test the goodness of fit of a theoretical distribution to observa- 
tions, and in 1911 he extended his x^-method to the problem of two samples. 
The first of these problems may be regarded as non-parametric if the choice of 
the theoretical distribution is not based on calculations from the data, and the 
second is without doubt a non-parametric problem. R. A. Fisher treated an 
analysis of variance problem non-parametrically at least as early as 1925, for in 
the first edition of his Statistical Methods for Research Workers we find the sign 
test. General formulations of the problems of statistical inference, and criteria 
for “good” and “best” solutions^ have been advanced by R. A. Fisher, Neyman, 
E. S. Pearson, and Wald. These general theories were all strictly parametric 
until 1941 when Wald proposed one sufficiently broad to cover the non-parametric 
case. 

We now introduce some notation to which we shall adhere throughout this 
paper. Statistical inferences are based on measurements. The total number 
of measurements will always be denoted by n. We conceive of n random 
variables Xi , X 2 , • • • , Xn on which the measurements are made. The domain 
of each Xj can always be taken to be a set of real numbers. If vector random 
variables occur, the Xy will denote components. The cumulative distribution 

function (c.d.f.) of the random variables will be written Fn{xi , a; 2 , • • • , Xn), 

this is the probability that all Xy < Xj simultaneously. The c.d.f. Fn is then 
always defined in a complete n-dimensional Euclidean space W, called the 
sample space; W is the space of points E = (xi , 0 : 2 , • • • , ajn). The sample 
point with the random coordinates Xi , • • • , Xn will be denoted by E. 

In describing the validity of specific non-parametric tests and estimates in the 
sequel it will be convenient to refer to the following classification® of univariate 
c.d.f’s F{x): ilo is the class of all F. ik is the class of all continuous F, Q 9 is 
the class of all absolutely continuous F, that is, those F for which there exists a 
probability density function f{x), so that 

Fix) » £/(0df. 

Q 4 consists of all F which may be written in the above form with / continuous. 


* For a bibliography see [22]. 
® The notation follows [31]. 
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Part I. Non-parametric Tests 

2. The randomization method of obtaining similar regions. In any problem 
of statistical inference it is assumed that the c.d.f. Fn of the measurements is a 
member of a given class Q, of n- variate distribution functions; we write Fn € 12. 
12 is called the class of admissible Fn . If 12 is a fc-parameter family of functions 
the problem is called parametric, otherwise, non-parametric. A statistical 
hypothesis // is a statement that e w, where a; is a given subclass of 12. A test 
of the hypothesis H consists of choosing a Borel region w in the sample space 
W and rejecting H if and only if the sample point E falls inw; w is called the 
critical region of the test. 

The choice of the critical region w is usually^ made as follows : A positive con- 
stant a (ordinarily about .01 or .05) is chosen and called the significance le\'el of 
test. If regions w exist for which Pr[E e w \ Fn} — the probability that the sam- 
ple point E fall in w, calculated from the c.d.f. Fn— is equal to a for all Fn e cc, 
then the choice of critical region is usually' limited to this class. Such regions 
are very important in the theory of testing hypotheses, and it is con\'enient to 
have a name for them: Following the terminology of Neyman [22] in the para- 
metric case we shall call them similar to the sample space with respect to all Fn 
in < 0 , or more briefly, similar regions. A similar region is then a region w for 
which Pr{E €tc I Fn} is the same constant for all Fn € w. The advantage of 
using similar regions as critical regions is that the risk of rejecting the hypothesis 
when it is true (type I error) is controlled: no matter what member of co the 
unknown Fn happens to be, the probability of rejection of the (true) hypothesis 
is exactly a. We remark here that the problem of the existence and structure 
of similar regions in the parametric case has been treated only under very heav\" 
restrictions and must be considered still mostly unsolved, whereas we shall see 
later that in the non-parametric case it promises to be relatively simple. 

When similar regions exist for a chosen a there is usually a large family of 
them. Ideally the choice of the critical region w from the family of similar 
regions would be based on a complete knowledge of two functionals of Fn for 
Fn e 12 — w, that is, for those Fn corresponding to the various admissible ways in 
which the hypothesis can be false: the first, the probability of rejection (of avoid- 
ing a type II error), namely Pr { E € r? 1 Fn} , called the power function of w, and 
the second, the relative importance of rejection in the concrete situation in which 
the test is to be applied. In other words, one would like to choose the w with the 
power function ''best” for the very specific problem at hand. However, little 
has been done along this line in the non-parametric case, and, as we shall note 
below, the choice of w from the family of similar regions is usually made by 
means of a statistic chosen on intuitive grounds. 

A general method of obtaining similar regions, which we shall call the ran- 
domizaticm method, will now be described. The credit for originating this 
method and envisioning its wide applicability belongs to R. A. Fisher, who first 

» Another approach to the choice of critical region will be described in section 13. 
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used it in 1925 [3]. Consider the set S of permutations on the coordinates 
xi j X 2 , • • • y Xn , which leave invariant all the c.d.f’s Fn in cj. Suppose the 
number of permutations in the set is s; then s divides nl Now define for any 
point E in W SL corresponding set {E'] of s points obtained by making on the 
coordinates of E the permutations of the set S. The value of the c.d.f. Fn is 
then the same at all s points E' generated by E, for nW E eW and all Fn e co. 
The s points of {E*\ will be distinct unless the point E lies in a certain region 
Wq ; IFo depends on the set S of permutations determined by the class w, and will 
always be contained in the union of all diagonal hyper-planes x* = Xj (i j), 
A critical region w is constructed by the randomization method by choosing a 
positive integer q < s, and for every E not in Wq , putting q points of the corre- 
sponding set {E'] in w and the remaining s — g points outside Wy by any rule 
whatever, just so ly is a Borel set. We shall also say that a Borel set w is ob- 
tained by the randomization method if it has the structure just described except 
on a (Borel) subset Wo of w having the property Pr[E ewo] Fn] *= 0 for all 
Fn € oj. It may be shown by the methods used elsewhere [31] by the writer 
that if CO is a class of continuous c.d.f’s then the region w thus obtained is a 
similar region with a = q/s; furthermore, that under mild restrictions (roughly, 
that the boundary of ii; be a sufficiently ‘Hhin^^ set), at least for certain classes co, 
this is the only method of obtaining similar regions. 

One might call the set {£'} of points corresponding to E the subpopulation of 
points ^^equally likely’^ under the null hypothesis H, but we shall call {£'} simply 
the subpopulation corresponding to E, The decision as to which q of the s points 
of the subpopulation are to be put into the critical region w is usually made with 
the aid of a statistic T chosen on an intuitive basis. By a statistic T we mean of 
course a function of the sample only, not depending on the c.d.f. Fn , thus 
r(E) == T{Xi , • • • , Jfn). For a suitably chosen g, the q points of the sub- 
population {E'] giving r(£") values in a certain range — usually the q largest or 
q smallest values — are put into Wy and these q values are then called the ‘‘sig- 
nificant*’ values. 

Before proceeding further let us consider an example illustrating all the defini- 
tions we have introduced thus far. Suppose that on the basis of a sample of m 
pairs {Xiy Yi)y i = 1, 2, • • • , m, from a bivariate population with unknown 
c.d.f G{x, y) we wish to test the independence of the random variables X, F. 
To fit our general notation write Yi = . Assuming only that the sample 

is random, we have, with n — 2m, that the c.d.f. of the sample point E is of the 
form 


F n{Xi j • • ' > 3/n) — XI ^{Xi , Xi^-m)* 

Now suppose we know or are willing to assume further that the unknown c.d.f. 
G(x, y) of the population is in a certain class of bivariate c.d.f. *8, where 
is the bivariate analogue of the class Q, of univariate c.d.f .*s defined in section 
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1 ; thus if we knew the unknown G{Xy y) were continuous, we would have G e Q 2 ^\ 
The class 12 of admissible Fn is then 

G(xi , Xi+m ) ; G € , 

where the notation {Fn | of the form 55} denotes the class of all Fn of the 
form |5- The hypothesis of independence may now be expressed as H: Fn^coy 
where the subclass w of 12 is 



0 ) 


Fn\Fn^IlF 

t—1 


( 1 ) 


2 m 

(x.) XI = 

j >»m+ 1 



The set S of permutations which leave all Fn eco invariant is obtained by mak- 
ing all possible permutations of the first m coordinates Xi , • • • , among them- 
selves, and of the second m coordinates Xm-i-i , • • * , •r 2 m among themselves. The 
total number s of permutations in S is thus (m!)\ Making these permutations 
on the coordinates of any point E in IF, we get the set { F' } of (m !)^ points. The 
points of {F'} are distinct unless F lies in the region Wo defined as the union 
of all hyperplanes Xi = Xj where i j and j are both in the set of integers 
1,2, • • • , m or else both in the set m + 1, • * • , 2m. Pitman [28] applied the 
randomization method to this problem, using as the statistic T{E) the numer- 
ical value of the (sample) Pearsonian correlation coefficient, 


T(E) = 


£ XiXi+m / (Z a:- £ xH , 
1-1 I / \t«l ;-m+l / 


the large values of 7' being the significant ones. We note that T{E) takes on 
at most ml different values over the subpopulation. What we previously called 
a '^suitably chosen^' q would be in the present case a multiple of m!, and the 
choice of significance level a = q/s would then be limited to multiples of I /ml. 
The method of randomization is seen to exploit whatever symmetry properties 
the Fn in w possess as a class. A special case of the general method is the method 
of ranks. This gives regions of an especially simple form defined by certain 
inequalities on the coordinates. Probably the only case in which the method of 
ranks will ever be used is when the Fn in co have the following special kind of 
symmetry: Suppose they are completely symmetrical in each of certain subsets 
of the coordinates, say t sets of Ui , ^ 2 , • • • , n* coordinates, respectively, where 
== We may assume the coordinates numbered so that Fn is com- 
pletely symmetrical in the set Xp^^i , Xp^^ 2 , • • • , Xpi+ni {p% = i = 2, 3, 

* • * Pi = 0), for all Fn € CO. The set S of permutations is thus generated by 
making all n< ! permutations on the n,* coordinates Xp^^i , • • • , a:p,.+n* (i = 1, • • • , 
t)y SO that the total number of permutations in S is « = ni ! n 2 ! • • • Utl, 

Corresponding to the i-th set of coordinates in which Fn is symmetrical, let 
us divide the sample space W up into n< ! regions defined by 
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and the «< ! — 1 other inequalities obtained by permuting the subscripts in the 
above. Denote these regions by Wi.k (A: = 1, • • • , n,- ! ). Let 

n n • • • n , 

that is, is the part of W common to the regions Wi,tj , , • • • , 

Wi,ki . This process divides the sample space W up into s disjoint regions 
, which we shall now denote simpl}' by w, (cr = 1, • • • , s). The set 
{w,} of regions covers all of the sample space IT except the region on which 
certain coordinates become equal. We shall say that the sample point E has 
the (T-th ranking, , if E falls in w, . We may then speak of a random variable 
R = R{E), the “ranking”, taking on the s possible values R ^ , or the “tied” 
ranking Ro if E * ITo . 

A critical region w is constructed by the method of ranks by taking w to be 
the union of q of the regions w, . Those rankings R, corresponding to the q 
regions w, constituting the critical region w, will be called the sigriificant rank- 
ings. Any statistic T{E) used as the criterion to decide which are the significant 
rankings now becomes a function of the ranking R only, T{E) = U{R). We 
may regard the method of ranks as a simplification of the problem of testing 
statistical hypotheses, in wluch the infinite n-dimensional sample space W is 
replaced by a finite space of s -f- 1 points R, , If 12 is a class of continuous Fn 
we may ignore the point i?o since then Pr{/i! = fio} =0. 

In the problem of independence, which we have used before to illustrate the 
definitions of this section, the method of ranks was applied by Hotelling and 
Pabst [9], who took as the statistic IJ{R) the numerical value of the Spearman 
coefficient of rank correlation, large values being significant. 

The method of randomization yields similar regions if w is a class of continuous 
functions. What will the method get us if we drop the continuity restriction? 
In this case we can no longer ignore the possibility that the sample point E fall 
in the exceptional region TTo , for we do not have Pr{E « Wo] = 0. We owe to 
Pitman [27] the following idea: We continue to use the subpopulation {E'\ and 
a chosen statistic T{E) as above, but instead of separating the points of {E'] 
into two classes (significant points and non-significant points) by means of T{E) 
we now add a third class of “doubtful” points.* If the s points of the set {E'] 
are not distinct they are to be counted according to their multiplicities under the 
process of applying the permutations of the set 5 to the coordinates of E. Sup- 
pose that the large values of T are significant. Number the s points of {E'] so 
that T{E[) > TiEi) > • • • > TiE',). If TiE',) > rfE’^+i) we call P'l , • • • , 
Ef significant, and the rest non-significant. However if T(Eg) = T{E',+i), we 
term all points E' with T(E') — T(E',) doubtful, points E' for which T{E') > 
T(E',), significant, and points E' with T{E') < TiE,), non-significant. This 
process divides the sample space W up into three regions instead of the customary 

‘ Instead of the terms significant, non-significant, doubtful, Pitman uses discordant, 
concordant, neutral. 
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two, namely, a rejection region Wr , an acceptance region Wa , and a doubtful 
region Wd . It is a special case of the following procedure: For every set [E'} 
define positive integers rn« = and iua = such that < 

q, rriA < 8 — q, and put Mr of the points E* in Wr , Ma of the points E' in Wa , 
and the remaining s — itia — rtiR of the points E' in Wd , in any way so that Wr 
and Wa are Borel regions. When any is assigned to Wr or Wa it is to be counted 
according to its multiplicity as defined above, if [E*\ contains less than s dis- 
tinct points. It may be shown that with a = q/s, Pr(E € !/;« j Fnl < a and 
Pr{E € Wa I Pn} < 1 — a for all Pn « w, that is, whenever H is true. 

Before closing this section on the method of randomization, we mention a few 
difficulties which frequently arise when it is applied. Except for very small 
samples the calculation determining whether or not the observed value Po of 
the sample point E belongs to the significant points of the subpopulation {Pi} 
generated by Po , is usually extremely tedious. In such cases the author of the 
test often gives an approximation to the discrete distribution of his statistic 
T(E) over the subpopulation {P'} by means of some familiar continuous dis- 
tribution for which tables are available, the laborious exact calculation by 
enumeration then being replaced by the computation of a few moments (that is, 
values of certain homogeneous polynomials in the observed coordinates) and the 
use of existing tables of percentage points of the continuous distribution.® 
Barring some papers where the method of ranks is used, the justification of these 
approximations is never satisfactory from a mathematical point of view, the 
argument being based on a study of the behavior of two, or at most four, mo- 
ments. The only exception to the last statement appears to be a very recent 
paper by Wald and Wolfowitz [42], which may point the way to genuine deriva- 
tions of asymptotic distributions for the non-rank case of the randomization 
method. We shall distinguish between derivations of asymptotic distributions 
and arguments based on two or four moments by saying that a distribution is 
'^proved” in the former case and ‘^fitted’' in the latter. 

Another difficulty arises, most noticeably in the method of ranks, out of the 
possibility of equality of the observed coordinates. In the distribution theory 
this is usually avoided by assuming w to be a class of continuous c.d.f’s, so that 
PrjE e Wo 1 Pn) = 0 for all Fn € w, but in practice, since the measurements are 
usually made to about three significant figures, ties do occur in the sample. 
While some scattered work has been done on this question there is need for a 
thorough general treatment. 

In some of the work that has been done on particular non-parametric tests 

* In many cases the approximate test obtained by fitting a familiar distribution is found 
to coincide with widely used tests based on normality assumptions. In such cases if the 
fitting is asymptotically correct the following remarks are justified: (1) If the non-para- 
metric test is used in a case where we hesitate to assume normality but normality actually 
exists, the non-parametric test is asymptotically as efficient as the older test assuming nor- 
mality. (2) If normality is assumed when it does not exist, no error is incurred asymp- 
totically when the older test is used. 
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it is not very clear just what the null hypothesis H is. Two situations often 
occur: Suppose fiT :Fn € w is the hypothesis we actually wish to test at significance 
level a. Let w be the chosen critical region and the class of for which 
Pr{E €W \ Fn} = a. The two situations are {i) co is a proper subset of co«, , 
and (ii) Wu, is a proper subset of w. Of these (i) seems less objectionable, for then 
the probability of a type I error (rejecting H 'when true) is strictly a, but the 
probability of accepting H is the same when certain alternatives {Fn € w,,, — w) 
are true as when H is true. In case (ii) the probability of a type I error is not 
a unless Fn is in the subclass of a>; thus there might be a much higher prob- 
ability than a of rejecting H when it is true, if the true Fn e o) — coy, • To illus- 
trate situation (i) consider K. Pearson’s x^-test for goodness of fit of a theoreti- 
cal distribution Fo(x) to a sample E. Suppose E is from a univariate population 
whose true c.d.f. is F(x). If F has the property that for the intervals I j defined 

in section 3, / dF= dFo , j = 1, 2, • • • , A^, then the probability of re- 

Jii hi 

jection is the same as when the hypothesis is true. An example of (ii) might 
occur if we wish to test whether the means of two univariate populations are the 
same. If we use one of the tests of section 4 in which the probability of rejection 
is calculated under the assumption that the distributions of the populations are 
the same, then we do not know that the probability of a type I error is a, for the 
samples might come from two populations with the same mean but different 
distributions. 

3 . Goodness of fit. Randomness. ’ The non-parametric case of testing good- 
ness of fit is the following: On the basis of a sample E from a population with 
c.d.f. F(x) known to be a member of some Qy , we wish to test whether F = Fo , 
where Fo is a given c.d.f. The class of admissible c.d.f.’s Fn is 

and the h3^othesi8 H specifies that Fn e w, where 

« = 

K. Pearson’s x^-test [25] consists of choosing an integer N, dividing the a;-axis 
up into a set {Ij} of disjoint intervals (j = k, 2, • • • , N), and using as statistic 
T(E) the Pearsonian chi square, 

xl = H [mi - S(mi)f/&{ini), 

i-l 

where my is the number of observed coordinates of E in /y, and S(my) = 
n / dFo • Large values of xp are regarded as significant. Exact significance 

Jit 
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levels for xp could be obtained by considering its distribution over the sub- 
population (E'j generated by the sample. This process would lead to the 
multinomial distribution of the mj mentioned in the usual derivations of the 
asymptotic distribution of xp (for with AT fixed). Pearson himself found 

this asymptotic distribution, namely the x^-distribution with N — 1 degrees of 
freedom. In studying the problem of a “best’’ choice of the set [I j] of intervals, 
Mann and Wald [17] adopted a non-parametric treatment, with = 2 for the 
class above. 

Another test not depending on a choice of intervals I j could be made by using 
confidence belts for F as described in section 9 and rejecting H at the a level of 
significance if the graph of Fo is not covered by the belt with confidence co- 
efficient 1 — a. 

The problem of randomness is usually non-parametric; in the univariate case 
the class w of this problem is identical with the class 12 of the preceding. The 
index v and the class Q, for the problem of randomness would depend on the 
specific situation in which it arises. With two exceptions [42, 52], all tests of 
randomness proposed thus far have been functions of runs in the sample. Two 
kinds of runs have been considered, runs up and down, and runs above and below 
the median [1, 4, 14, 19, 32, 44, 51]. We note that the set S of permutations 
determined by w is the set of all n! permutations on the n coordinates of E, 
Suppose now i/ = 2, The proof [31] that all similar regions w have the random- 
ization structure applies to this problem. On the other hand such a region w 
has the property Pr { E € ti; | Fn} = a for any Fn which is completely symmetrical 
in the coordinates. Difficulty {i) discussed at the end of section 2 now arises if 
12 contains such symmetrical alternatives. The definition of an appropriate 
class 12 — CO of alternatives and the question of the power of tests against the 
alternatives make the problem of randomness a difficult one. Beyond these 
few remarks we refer the reader to an expository paper by Wolfowitz [51] de- 
voted to the problem in the previous issue of this journal, and to a paper by 
Wald and Wolfowitz [42] in the present issue. The latter paper is one of the 
exceptions, previously mentioned, not based on the method of ranks. 

4. The problem of two samples. Suppose , • • • , and Fi , • • • , are 

samples from univariate populations with c.d.f’s F{x) and G{x) respectively, 
where we assume F, G € 12, , and that we wish to test the hypothesis that F ^ G. 
Write Yi = , so that with n = mi + 7^2 we have 

f mi n "N 

« = Fn I = n n 0{xi); F, Geil,}, 

M y-mi+l J 

CO = l^nlFn = 

The set S of permutations determining the subpopulation [E'} consists of all 
n! permutations on the n coordinates of E, The writer has shown [31] that no 
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similar regions exist in this case if v = 0, while if i> » 2, 3, or 4 a similar region 
necessarily has the randomization structure. 

The first non-parametric attack on this problem was given [26] by K. Pearson. 
The a:-axis is divided up into intervals h , • ■ • , ly as in section 3. Let ntfl 
and vifl be the number of measurements from the first and second samples, re- 
spectively, falling in 7,-, so that ntjh = rrik, k = 1, 2. The statistic T(E) 
used is 


2 

Xp’ 


- (minii) ^ ^ (mima - Tntmiif/(m,i -h m,t), 




with large values significant. In view of the remarks at the end of the last 
paragraph it would be necessary to calculate the distribution of x*' over the sub- 
population in order to get a similar region. Pearson found the assonptotic 
distribution of xr' under the null hypothesis to be the x*-distribution with 
N — 1 degrees of freedom. 

A solution based on the method of randomization was proposed by Pitman 
[27]; the special case of this solution for mi = ms was published a little earlier 
by R. A. Fisher [6]. Pitman employed the numerical value of the difference of 
the sample means as statistic. 


T(E) = 


X - Z) a;//m* 

<-l 7-mi+l 


large vtdues being significant. He fitted an incomplete Beta-distribution to the 
subpopulation distribution of his T(E), and noted that this approximation 
gave a result identical with the usual t-test valid when the population distribu- 
tions F(x) and G(x) are assumed normal with equal variances. 

Turning now to tests based on the method of ranks, we mention here that one 
for the case mi = mj was given by R. A. Fisher as early as 1925, namely the 
“sign test” or “binomial series test” [3]. We may (and Fisher did) regard this 
as a test of a less restrictive hypothesis, and shall describe it in section 6. Be- 
tween 1938 and 1940 several tests employing ranks were proposed for the problem 
of two samples. The earliest of these, by W. R. Thompson [36], was shown to 
be inconsistent (section 11) with respect to certain alternatives F« efl — « by 
Wald and Wolfowitz [40]. These authors used as statistic U (R) the total num- 
ber of runs in a sequence F of n elements constructed as follows: Rank the 
measurements of the combined sample in order of increasing magnitude. Ac- 
cording as the j-th measurement in this rank order is from the first or second 
sample, put the j-th element of the sequence V equal to 1 or 2. In this test small 
values of the statistic U{R) are regarded as significant. The test is now quite 
practicable (for v = 2) for certain ranges of mi and ms . For mi and ttis both 
< 20, tables by Swed and Eisenhart [34] give the 1% and 6% significant values 
of U{R). Wald and Wolfowitz proved that forn « with k — mi/mt fixed, 
the distribution of U(,R) is asymptotically normal with mean 2mi/(l -f k) and 
variance 4A:mi/ (1 -f k)^. Swed and Eisenhart computed that for mi = mj this 
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gives a very satisfactory approximation outside the range of their tables. How- 
ever, further computation needs to be done on the accuracy of this approximation 
for mi 9^ 7712 and one of them >20. 

Another test based on ranks was advanced by Dixon [2], using as statistic 
IJ{R) the random variable • 

c* = + 1)"' - 

J-1 

where the integers n, are defined thus: Let Zi < Z 2 < • ■ • < denote the 
measurements of the first sample arranged in rank order. Then ny is the number 
of measurements in the second sample falling in the interval (Z/_i , Z,), where 
we define Zo = — «> , Zm^+i ==+«>. Large values of C* are significant. Dixon 
tabulated the 1%, 5%, and 10% significant values of for TOi , m 2 = 2, 3, • • • , 
10; for larger mi , m 2 he fitted a x’-distribution. 

A paper by Smirnoff [33, 16] suggests the following as a statistic U(R): Let 
Gmi (x) and Gm, (x) be the “empirical distribution functions” of the first and 
second samples, that is, m,<?mj(a:) is the number of measurements in the f-th 
sample <x (i = 1, 2) and take’ 

U(R) = (mr' + mi-’)-* sup [ G^, (x) - G„, (x) | 

X 

with large values significant. Smirnoff showed that if i* = 2 the asymptotic 
distribution of his statistic U{R) is a certain c.d.f. (X), previously introduced 
by Kolmogoroff [15]. More specifically, let = Pr[U{It) < X; >> = 2). 

Then if n — > <» with mi/mt fixed, we have f>«,.m 2 (X) -* ^(X). The definition 
of #(X) and references to tables of $ (X) are given in section 9. If instead of 
assuming !< = 2 we take «» = 0, the risk of type I errors may be controlled to the 
extent that Prjrejecting H\ < a for all « w, by employing Smirnoff’s theorem 
stating Pr[U{R) < X; i* = 0} < <^mt,mj(X), where ^mi.inj(X) is defined above. 

A test for the problem of two samples obtained by Wolfowitz by a modifica- 
tion of the likelihood ratio procedure will be discussed in section 12. When 
mi = m 2 the non-parametric analysis of variance tests of the “randomized 
blocks” type described in section 6 might also be used to test the more restricted 
hypothesis considered in this seetion. 

The non-parametric problem of k samples has been attacked by Welch [46], 
who used the method of randomization with the analysis of variance ratio as 
statistic T{E), and by Wolfowitz [50] with his modified likelihood ratio method. 

In this as in all the other sections where several solutions of the same problem 
of statistical inference are described, the question as to the relative merits of 
the various solutions arises, and in every case the question is as yet mostly or 
entirely unanswered. The only easy conclusion about the tests of this section 
would seem to be that the tests of K. Pearson and Pitman are not consistent with 

’ We use the notations sup and if>S respectively for least upper bound and greatest lower 
bound. 
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respect to certain subclasses of the admissible alternatives, according to the 
definition of section 11. 

6. Independence. The classes U and o) defining the problem of independence 
have already been stated in* section 2, in which we described Pitman^s test [28] 
based on the randomization method and the use of | r | as statistic T{E), where 
r is the sample value of the Pearsonian correlation coeflScient. Pitman fitted 
an incomplete Beta-distribution to the subpopulation distribution of and found 
the resulting approximation for j/ = 2 equivalent to the usual test employing the 
^distribution and valid for the case of normality. 

In section 2 we also mentioned the test earlier proposed by Hotelling and 
Pabst [9], which is based on the method ranks and employs the statistic U (R) = 

I r' I, where r' is the Spearman rank correlation coefficient. They proved that 
for X/ = 2 the distribution of r' is asymptotically normal if Fn e w. Pitman's 
fitting of an incomplete Beta-distribution applies also to (r'f, a,nd Kendall, 
Kendall, and Smith [12] made numerical calculations indicating that this gives 
a better approximation than the normal distribution. Since r' is calculated 
from Zd^y the sum of the squared rank differences, the latter may equivalently 
be used as the statistic U(R)y small and large values of Zd^ being now both 
significant. Kendall, Kendall, and Smith [12] found the exact distribution of 
Zd^ for the number of pairs m < 8. This work was anticipated by Olds [23], 
who calculated the exact distribution of Zd^ for m < 7, and by fitting certain 
distributions for m > 7, gave a very useful table of the 1%, 2%, 4%, 10% and 
20% significant values of Zd^ for m < 30. It would be desirable to have these 
tables extended by inclusion of the 5% values. 

M. G. Kendall [10] proposed another measure of rank correlation whose sig- 
nificant values are easier to calculate than those of Zd^, but since the Olds’ tables 
for the latter are available, Kendall’s innovation does not seem to possess much 
practical advantage. Wolfowitz [50], using his modified likehhood ratio method, 
gave another test for independence and generalized it to the problem of inde- 
pendence of k random variables. 

6. Analysis of variance. We suppose that we have n — rc measurements 
arranged in a rectangular layout of r rows and c columns. The r rows might 
correspond to the blocks and the c columns to the varieties in an agricultural 
experiment. The null hypothesis H is that of “no difference” in the column 
effects. The measurement in the i-th row and j-th column is supposed to be on 
a random variable® Xij with c.d.f. (x) =’ Pr{Xij < x\. Let us assume at 
first that all the X^ are independent. The joint c.d.f. of the random variables 
Xijy • • • , Xrj of the j-th column is then 

• • • , Xr) = Pr [xij < XI, • • • , Xri < Xr} = JI F^^^\xi). 

® The double subscript notation is more convenient here than that used in section 2; 
after the class m has been defined the reader will see that the numbers n* used in section 2 to 
describe the symmetry of the Fn € u are all equal to c, and the Xa of the present section 
coincides with the of section 2. 
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The symbol Fn for the joint c.d.f. of all n random variables now denotes B\(xii , 
• • • , ^ic ; • • • ; , * • • , Xrc). ^ is the class of all Fn of the form 

7-1 

where F^^^ is defined by the preceding equation, and all F^*^^ are in a given class 
Qp . The hypothesis H states that the column distributions are all the same, 

F'^’'^(xi , • • • , I,) = F^^\xi , • • • , Xr) O' = 2, 3, • • • , c), 

without specifying u is thus the subclass of comprising all Fn of the form 

Fn = n i'’‘”(xu, ••• ,x„). 

7-1 

The Fn in co may be written 

Regarding the factor in braces for fixed i, we see that it is left unchanged by any 
permutation of the c coordinates , • • • , Xic. The set S of permutations is 
thus determined, and the subpopulation {£?'} consists of the (c!)*^ points obtained 
by permuting among themselves the first set of c coordinates, the second set of 
c coordinates, • • • , the r-th set of c coordinates of E (xu, • • • , xid * • • ; 

^rl > * * * > ^rc)« 

The above argument leading to the subpopulation {E'} of (c!)’’ points is based 
squarely on the assumed independence of the n random variables Xij . Suppose 
now that the Xij are not known to be independent, as may happen in agricul- 
tural experiments [24]. To make the discussion concrete suppose in the r X c 
layout we have been considering, the rows refer to blocks (of ]ilots) and the 
columns to varieties, so that the random variable Xij is the yield of the j-th 
variety on the i-th block. We owe to R. A. Fisher the method of including 
early in the experiment a random process which leads to the same “equally 
likely” subpopulation of points {£?'} obtained before in the case of independence. 
This physical process which he calls “randomization” then permits the construc- 
tion of critical regions by the “method of randomization” in the sense we have 
been using the term. 

To explain the experimental process of randomization we shall imagine another 
r X c layout and a random set of mappings of the two layouts onto each other. 
In each block there are c plots and we now assume these numbered from 1 to c, 
the numbering to be held fixed. The second layout refers to the plots; the rows 
again correspond to the blocks, but the columns now correspond to the number 
of the plot in the block, thus the i, j cell represents the j-th plot in the i-th block. 
Now consider all 1:1 correspondences or mappings between the two layouts so 
that the i-th row always maps onto the i-th row (i = 1, • • • , r). There are 
8 = (cl)’' such mappings Mk (A; = !,*••,«). Suppose under the mapping Mk 
the t, t cell in the block-plot layout maps on the i, jk cell of the block-variety 
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layout, where jk = jk (i, 0? and the i, j cell of the latter corresponds to the i, tk 
cell of the former, tk = tk (i, i). The physical randomization process consists of 
choosing the mapping Mk so that all s mappings have the same probability 1/s 
of being chosen. In other words, the randomized block pattern is selected in 
such a way that all the s possible patterns have equal probabilities of being 
adopted in the experiment. Now let be the yield of the i, i plot if the variety 
assigned to it by the fc-th pattern is planted there, and let , • • • , Vrc) = 

Pr{ all Fif < yij] be the joint c.d.f. of the F<f . In calculating the c.d.f. Fn of 
the associated with the first layout we must take account of the random 
process by which it is mapped onto the second: 

Fnixii , • • • , Xrc) = Prjall < Xij] 

= ± Pr{all < x./} 

A*—! 

8 

~ 5 ^ > * * * j ^r,<jb(r,o)) • 

A-1 

Q consists of all Fn of the above form with in a given class, say The 

hypothesis H of “no difference’^ of varieties asserts that the yields of the plots 
do not depend on the varieties planted on them, that is, that all are the same, 
G^^^ = without specifying cj is the subclass of Q whose members are 
of the form 


Fn — ® ^ > * * * > 

It is now seen that any permutation in the set S previously considered merely 
rearranges the terms of the above sum, so that Fn remains invariant, and we 
have the same subpopulation {E'} as before. 

It is to be understood henceforth that either the Xij are known to be inde- 
pendent or else an experimental randomization has been carried out as described 
above, so that in either case the above set {E'] of (c!)’’ points is the “equally 
likely” subpopulation. 

The first application in the literature of the randomization method is found in 
R. A. Fisher’s “sign test” or “binomial series test” [3] for the case of randomized 
blocks with two columns (c = 2). Let Di be the difference Xu — . The 

statistic used is a function of the ranking only, namely the number of Di > 0, 
small and large values being significant. For y = 2 its distribution under the 
null hypothesis is the binomial dis'mbution with the n and p of the usual notation 
equal respectively to r and i. This test may be regarded as the special case when 
c =* 2 of Friedman’s rank method for analysis of variance to be described below. 

Fisher later [5] proposed another test for the case c = 2 not based on ranks, 
and employing as statistic T(E) the absolute value of the mean of the Di defined 
above, with large values significant. The exact distribution of this statistic is 
very laborious to calculate unless r is very small, and K. R. Nair [20] pointed 
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out that the use of the numerical value of the median of the Di (or one of the two 
central values when r is even) had the advantage of a very easily calculated dis- 
tribution (if V = 2). The latter may be regarded as a modification of the rank 
method, the method of ranks being applied not in the 2r-dimensionsl sample 
space as described in section 2 but in the r-dimensional space of the differences 
Di . Nair also showed that the distributions of the range and of the midpoint 
of the range of the Di are very simple. 

From here on we consider the general case c > 2, but when we speak of dis- 
tributions theji will be understood to be for the case when the null h5Tpothesis is 
true and = 2. Welch [45] considered using as T{E) the usual analysis of 
variance ratio appropriate to testing for “no difference” of column effects. He 
transformed this to another statistic and calculated two moments of its subpopu- 
lation distribution. The first moment always agrees with that obtained under 
“normal theory”, that is for the case Xu = C* -f Za , where the C,- are constants 
and the Z,-,- are independently normally distributed with the same variance and 
zero means, but the second moment depends on the subpopulation \E'\. Here 
the exact distribution of the statistic is of course in general much more tedious to 
calculate than in the previous case c = 2; an incomplete Beta-distribution was 
fitted by Welch. Welch anticipated Pitman [29] who obtained the same results 
and got besides the third and fourth moments of Welch’s statistic. 

The method of ranks was applied by Friedman [7] who employed as statistic 
V{R) a quantity formed as follows: Rank each set of row entries (for fixed i) 
in ascending order of magnitude, and let r,y be the rank of X<y, so that r,i , • • • , 
Tie is a rearrangement of the integers 1, • • • , c. Let fj be the mean rank of the 
j-th column, f,- = ^^ra/r, and take for U{R) 

U ^ Or, t [#=1 - 

where €„ is a certain constant, and 5(f ,) is calculated under the null hypothesis. 
For Friedman’s choice of U may be rapidly computed from the equivalent 
formula 

U = -3r(c + 1) -b 12 2 j [»■«(« + I)!- 

In his paper Friedman included a proof of Wilks’ that IJ has asymptotically the 
x*-distribution with c — 1 degrees of freedom as r . Kendall and Smith 
[13] fitted to a transform of f7 a Ksher z-distribution with continuity corrections, 
obtaining a better approximation for small r than the x*-distribution. Wallis 
[43] independently proposed the use of til — U/[r(fi — 1)] as statistic and called 
it the rank correlation ratio. Friedman in a later paper [8] on the subject, using 
exact values he had calculated, together with the Kendall-Smith approximation, 
published tables* of the 1% and 5% significant values of 17 for c = 3, 4, 5, 6, 


• In these tables our U, r, c are denoted respectively by x * , n. 
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7, and sufficiently many values of r so that for these c and any r the significant 
values of U are now easily available. 

After the above lengthy discussion for the ‘‘randomized blocks^' case of analysis 
of variance, it will perhaps suffice merely to mention that the “Latin square” 
case may be similarly attacked from the non-parametric point of view, and this 
has been considered by Welch [45], Pitman [29], and E. S. Pearson [24]. ^ They 
have taken as the statistic the usual analysis of variance ratio, and the work of 
Welch and Pitman in calculating the first two moments of its subpopulation 
distribution is even more tedious than in the “randomized blockj” case. 

Part II. Non-parametric Estimation 

7. Classical results on point estimation. Throughout part II the symbol 
E will always denote a random sample Xi , • • • , Xn from a univariate population 
with c.d.f. F{x), where F is an unknown member of a given class to be stated 
in each case. The c.d.f. of E is thus 


Fn{xx , • • • , ««) = n FiXi). 

t-1 

The problems of estimation can be stated without reference to the class Q of 
admissible Fn) would be obvious in every case. 

Let 6 — d(F) be a real number determined by F (a functional of F) for F in a 
certain class of univariate c.d.f\s. Thus 6 might be the mean of the distribution, 
in which case d would be defined for all F possessing a first moment. We shall 
not call 6 a parameter in order to avoid confusion with the parametric case. 
,R. A. Fisher’s criteria of unbiasedness and of consistency for point estimation 
carry over without change from the parametric case. A statistic T{E) is said 
to be an unbiased estimate of d if &{T) = 6, Write E = En and T = Tn to 
emphasize the sample size n, and assume that the statistic 7 n(En) is defined for 
all n (or all n > some ?io). Then we define Tn(En) to be a consistent estimate 
of S if it converges stochastically to 6, that is, if Pr{ | Tn — ^ I > /i} — > 0 as n 
00 , for every h > 0, 

In the present paragraph it will be convenient to symbolize the class of F for 
which the i-th (absolute) moment exists; we denote it by %i){i = 1, 2, • • *)• 
It is known^^^ that a sufficient condition for the stochastic convergence of the 
sample mean x to the population mean is that F € 12(i) . Hence for all F c fl(i) , 
^ is a consistent estimate of the population mean; furthermore it is unbiased. If 
we apply this result to the random variable Y f= Z*, we find that for all F € 12(2) , 
is a consistent unbiased estimate of the second moment of F about the 
origin. Similar statements may be made for higher moments. For F € 12 ( 2 ) 
one may show further that with Q defined as 27-1 (^» — x)^, the statistics Q/n 
and Q/(n — 1) are consistent estimates of the population variance, and the 
latter is unbiased. . 

See, for example, J. L. Doob, Annals of Math, StaL, Vol. 6 (1935), p. 163. 
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If there exists a number M such that F{M) = it ivS callo(hthe median of the 
distribution. The median ^ of a sample of odd size is the (*oiitral Xi when the 
Xi are arranged in order of magnitude; for a sample of even size we may take 
X to be the average of the two central values. It may l)e shown^^ that x is a con- 
sistent estimate of M for F in the subclass of % for which the probability density 
functiin/(a:) is continuous at a; = A/ and f(M) 9 ^ 0. 

8* Confidence intervals for an unknown median, for the difference of medians. 

Arrange the sample in rank order and denote the result by Zi < Z 2 < • • • < 
Zn , where , • • • , Zn is a rearrangement of Xi , • • • , Xn . The joint dis- 
tribution of the Zi (or any subset of the Zi) is well known [49] if F{x) is restricted 
to ^ 4 , which we now assume. From this distribution theory it is easy to show 
that for any positive integer k < ^n, the probability that the random interval 
(Zk , Zn^k-ii) cover the unknown population median M is 

Pr{Zk <M< Zn-fc+i} = 1 - 27i(n - fc + 1, fc), 

where 

Up, q) = / j[* 

is the incomplete Beta-distribution tabulated by K. Pearson. The practicability 
of estimating M by means of the above relation in the non-parametric case was 
noted first by W. R. Thompson [35]. It is not difficult to calculate tables giving, 
for vaiious sample sizes n, the maximum k for which Pr{Zfc < ilf < Zn-fc^i} > 
.95 or .99. This has been done for n = 6 to 81 by K. R. Nair [21], who listed 
the maximum k as well as n — A; + 1 and — fc + 1, fc), so that the exact 
confidence coefficient is available. Nair also gave as^miptotic formulas which 
are very accurate for n > 81. 

It is clear how confidence intervals for the difference d = ilf 2 — Mi of the 
medians of two univariate populations with c.d.f^s known only to be in Qi might 
be obtained by combining two probability statements of the above kind: I^et 
the desired confidence coefficient be 1 — a, and form confidence intervals of the 
above type for Mi and M 2 with confidence coefficient 1 — ^a; write them 
Pr{Mi < Mi < Mi] > 1 - ^a. Then Pr[M 2 - Mi < d < M 2 - Mi] > I ^ a. 
Solutions like this which are easily obtained by the combining method in many 
problems are in general not very efficient. 

Some work of Pitman ^s [27] may be regarded as a solution of the problem of 
estimating the difference of medians (or other quantiles, or means) of two 
populations in a case essentially more restricted than the preceding, but more 
general than the corresponding parametric case in which the distributions are 
assumed to differ only in location. To describe the nature of Rtman’s result, 

“ This follows from the asymptotic distribution of f . See, for instance, [49], and com- 
bine section 4.63 with Theorem (A), p. 134. 
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let US revert to the notation introduced at the beginning of section 4, but add to 
the assumption that F and G are in a known class Q, the restrictive assum'ption 
that F and G differ only in location, that is, that G{x) = F{x — d). The problem 
is the interval estimation of the unknown constant d. Define the random vari- 
ables Zi= Yi — d. After noting that the mi -f- m* random variables Zi , • • • , 
■X'mj , , • • • , Zm, are all independently distributed with the same c.fl.f. F, 

Pitman was able to apply his results for the problem of two samples to show how 
functions 4 and d of Zi , • • • , , Ti , • • • , 7™, could be calculated such that 

Pr{d < d < d} > 1 — a for F = 0, while for v = 2 the equality holds. After 
fitting an incomplete Beta-distribution Pitman found that the resulting approxi- 
mate confidence intervals coincide with the well known ones employing the 
f-distribution and based on the assumption that F and G are normal with the 
same unknown variance. 

9. Confidence limits for an unknown distribution function. -Consider in 
an X, y-plane the graph g of the unknown c.d.f., g being the locus of the equation 
y — F(x), and the possibility of covering g with random regions 9?(E) depending 
on the sample E. Wald and Wolfowitz [39] have shown how for given n and a 
it is possible in a large variety of ways to define regions 9i(E) such that Pr{SR(E) 
3 (;}, the probability that the random region 9?(E) cover the unknown graph g, 
is 1 — a for all F eQi . Instead of describing their general method we shall 
limit ourselves to a special case. This is a very neat solution the necessary 
distribution theory for which was developed earlier by Kolmogoroff [15]. 

Let Gn{x) be the “empirical distribution function” of the sample: nGnix) is 
• the number of Xi < x. Define the random variable 

Dn - Vn sup I F(x) — Gn{x) ], 

9 

and let *n(X) be the c.d.f. of Dn , #„(X) = Pr{D« < X}. Kolmogoroff proved 
that #»(X) is independent of F eik, and that as n — »• <» , #n(X) -*■ $ (X) uni- 
formly in X, where #(X) is defined by the rapidly converging Dirichlet series 

W = £ (-l)*exp (-2jfe*X*). 

A small table of values of the function $(X) was given by Kolmogoroff [15], and 
a larger one by Smirnoff [33]. Define Xn.« from #»(X»,«) == 1 — a, and X« from 
$(X«) = 1 — cr. Values of X« for a = .06, .02, .01, .005, .002, .001 were listed 
by Kolmogoroff [16]. Now 1 — a is the probability that 

•v/n sup I F(x) — G„(x) I < X„.« 

9 

if F tilt. The above inequality is equivalent to 

Gn(x) — Xn.«/-\/n < F(x) < Gn(x) \n,a /‘\/ W (oU *)• 

If we take as 9i(E) the intersection of the region between the graphs of the func- 
tions Gnix) =b \n,a/y/n, with the strip 0 < y < 1, we have Pr{9i(E) ZD ff] 
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1 — Of. The values of Xn.a have not been tabulated, but for practical purposes 
of determining an unknown c.d.f. one would usually require a large n, and the 
tabulated values of Xa could then be used. 

With<>n(X) defined as the c.d.f. of Dn for F € 122 , Kolinogoroff has shown further 
that for F 6 12 o , Pr[Dn < X} > ^n(X). This gives the beautiful result that the 
above confidence belt is valid in the most general case where F € S2o , in the sense 
that for the above defined 9 i(E), Pr{ 9 ?(E) 3 fif} > 1 — a. 

10 . Tolerance limits. An ingenious formulation and solution of a non-para- 
metric estimation problem was given by Wilks [ 47 ]. I^t us say that an interval 
{x\ x") covers a proportion tt of a population with c.d.f. F(x) if F(x") — 
F{x^) = TT. In the notation of section 8, Wilks considered the proportion B cov- 
ered by the interval {Zk , Zn-m+i) extending from the fc-th smallest observation 
to the m-th largest, B = F{Zn~m+i) — F{Zk). P is a random variable depending 
on the sample but is not a statistic since it depends also on the unknown c.d.f. 
F(x). However, Wilks noted that the c.d.f. G(6) of B is independent of F € 124 , 
in fact, for 0 < 6 < 1, 

G{h) == h{n — fc — m + l, ^ + w^), 

where /*(p, q) is defined in section 8. After m, a fixed proportion 6, and a 
confidence coefficient 1 — a hav^e been chosen, the equation G{b) = a determines 
the sample size n for which we can then make the following assertion without 
any knowledge of F except that F € II4: The probability is 1 — a that in a sample 
size n the random interval {Zk , Zn-m+i) will cover at least 100 h% of the popu- 
lation.^^ 

Wilks considered, among other extensions of his method, tolerance limits for 
multivariate distributions in which the variables are known to be independent, 
and the estimation of proportions in a second sample (instead of in the popula- 
tion) on the basis of a first sample [ 48 ]. The latter problem involves the calcu- 
lation of P{b; n, m), the probability that if a first sample of n is taken and 
then a second sample of AT, a proportion b or more of the second sample will lie 
in the interval (Zk , Zn-m^i) determined from the first sample. Wilks’ deriva- 
tion of P requires the assumption that F c Q4 , but a simple auxiliary argument 
(related to the method of randomization by ranks) will extend the validity to 
the case F « 122: The complete set oin + N variates is independently distributed, 
each with the same c.d.f. F € 82 . All (n -f iV) ! possible rankings (excluding the 
‘‘tied” ranking Ro) as defined in section 2 then have the same probability 
l/(n + AT) !. The fraction of these rankings for which the statement about pro- 
portions in the second sample is correct is a function of b, n, iV, fc, m only, and 
not of F € 82 , and this fraction is the desired P. Since P is the same for all 
F 6 82 it must of course coincide with the value calculated by Wilks for F c 84 . 
It would be desirable for practical purposes to extend the validity of the tolerance 

For fixed b, 0{b) of course takes on discrete values with n, so one would either choose 
the n giving 0(b) the nearest value to a or else the greatest value ^ a. 



324 


HENRY SCHEFF^ 


limits of the first paragraph, concerning proportions in the population, at least 
to the case F , The extension to ^2 would follow immediately if the in- 
tuitively reasonable statement 1 — G(b) = limAr-^oo P{b; n, N, fc, m) could be 
justified for F , 

The multivariate case when independence is not assumed was successfully 
attacked by Wald [38]. We shall describe here his solution for the bivariate 
case: Let (X< , 7<), i = 1, • • • , n, be a sample from a population with bivariate 
c.d.f. F{Xf y) € that is, F is of the form 

F(x, y) = f f /({, ri) dri df, 

where /(x, y) is continuous, but otherwise unknown. Plot the points (Xi, Yi) 
in an x, i/-plane and choose four (small) integers ki , mi , k 2 , m 2 . Draw vertical 
lines (parallel to the t/-axis) passing through the points with the fcrth smallest 
and mi-th largest abscissas. Considering only the n — fci — mi joints inside 
these vertical lines (the probability of equal abscissas is zero), draw two hori- 
zontal lines passing through the points with fe-th smallest and m 2 -th largest 
ordinates. .Let J be the rectangle bounded by the four lines and consider the 

proportion B of the population covered by the rectangle, P = j dF(x, y). Then 

the c.d.f. Gib) of B is given by the previous formula in terms of the incomplete 
Beta-distribution with fc + m = fci + fc 2 ‘+ mi + ^ 2 , and is thus independent 
of /(x, y). Choose fci , , ^ 2 , 6, and a. Then the equation GQ)) = a de- 

termines the sample size n for which the probability is 1 — a that the random 
' rectangle J will cover at least 100 b% of the population. Wald showed further 
how a series of rectangles instead of a single rectangle might advantageously be 
used in the case of highly correlated X, Y, 

It would be most useful to have tables of n corresponding to a = .05 and .01, 
some values of b close to unity, and a few small values of ft + m, say, ft -h m = 
2, 4, • • • , 2r. The table could then be used for the univariate, bivariate, • • • , 
r-variate cases with various choices of fty , my , such that S(fty + my) = ft + m. 
Entries for ft + m = 4 have been given by Wald [38, p. 55]. 

Part III. Toward a General Theory 

11. The criterion of consistency. All the concepts of Part III have been 
carried over from, or suggested by, corresponding ones earlier developed for the 
parametric theory. Consistency of point estimation was defined in section 7. 
Wald and Wolfowitz [40] have generalized the notion of consistency to tests so 
that it is applicable in the non-parametric case. We have heretofore specified 
the hypothesis H and its admissible alternatives by means of classes of ri- variate 
c.d.f’s Fn . We now assume that H and its admissible alternatives can be 
framed as statements about one or more populations, independent of n. Thus 
in the problem of two samples (section 4) H may be taken as the statement that 
the c.d.f ^8 F and G of the two populations are the same member of Qp , while the 
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admissible alternatives are statements that F and G are any two different mem- 
bers of . Returning to the general case, we assume that a sequence of tests is 
under consideration, say, , I 2 , * * * , such that as j 00 , the size of the sample 
in Xj from each of the populations becomes infinite. The sequence {2^} may 
be called simply a ‘‘test” and is said to be consistent if the probability of rejec- 
tion of H by approaches unity as j — > 00 whenever an admissible alternative 
to H is true. It has been suggested [50] that consistency is a minimal require- 
ment for a good test. In order to allow for the analogue of the “common best 
critical regions” in the parametric theory,^® it would be better to define consist- 
ency with respect to any given subset of the admissible alternatives and then 
require consistency with respect to the subset appropriate to the specific situa- 
tion in which the test is to be used. 

Wald and Wolfowitz [40] proved that under certain restrictions on the ad- 
missible F, G in the problem of two samples their test based on runs (section 4) 
is consistent, while another previously proposed test is not. Judging from 
their work, we may expect that, while inconsistency proofs may be easy, con- 
sistency proofs will be difficult. 

12. Likelihood ratio tests. A definition of the Neyman- Pearson likelihood 
ratio criterion^^ X for testing the hypothesis H (we use the notation of section 2), 
which would yield the usual result in the parametric case, would be the follow- 
ing: Let C{E;8) be a cube of edge 25 in the sample space W with center at the 
point E and faces parallel to the coordinate hyperplancs, and let P{E;6 | Fn) be 
the “probability put into the cube by the c.d.f. Fn\ that is, P{E]d [ Fn) = 

f dFn . Define 

JciE\6) 

ME; 8) = [ sup PiE; 8 | F„)]/[ sup P(E; 8 \ Fn)], 

X = x(£) = hm \{E; 5). 
a-^o 

This definition of X is not useful in the non-parametric case as X turns out in 
general to be independent of E; the reader may easily verify this for the problem 
of two samples (section 4). 

Having seen now that the likelihood ratio does not carry over to the non-para- 
metric case in an obvious way, we are in a position to appreciate a bold stroke 
by Wolfowitz [50]. He begins by limiting the critical regions to be considered 
to the relatively small class obtainable by the method of ranks (section 2). Let 
R = /2(E) be the ranking of the sample point E, so that the random variable 72 
takes on the possible values /2o , /2i , • • • , J?, , and let P(Re | Fn) = Pr { /2 = 1 Fn } . 


” J. Neyman and E. S. Pearson, ‘‘On the problem of the most efficient tests of statistical 
hypotheses”, Phil. Trans. Roy. Soc. London j A, Vol. 231 (1933), pp. 289-337. 

J. Neyman and E. S. Pearson, Biometrika^ Vol. 20A (1928), p. 264. 
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Then Wolfowitz takes the likelihood ratio to be the following function of the 
ranking R: 

A(fl) = [ sup P(R I F„)]/[ sup P{R 1 F„)]. 

His modified likelihood ratio test then consists of applying the method of ranks 
(section 2) with A(/2) as the statistic, small values being regarded as significant. 
If 12 is a class of continuous Fn , all rankings R ^ Ro have the same probability 
1/s under the null hypothesis, while P(Ro | Fn) = 0 for all Fn e 12. Then the 
numerator of A(/2) is 1/s, and we may thus use the denominator of A(/2) as 
statistic with large values significant. Wolfowitz’ modification has one ad- 
vantage we don’t always find with the usual parametric method: it always leads 
to similar regions since it is a special case of the randomization method. 

In applying his method to examples Wolfowitz finds it necessary to resort each 
time to an approximation in calculating his statistic A{R). Instead of taking 
the “sup” over 12 as in the definition, he takes it instead over a subclass 12' of 12 
which lends itself more easily to calculation. Thus in the problem of two samples 
with p = 2, whereas 12 is the class defined in section 4 with F, Gin Q 2 , the class 
12' is the subclass of 12 obtained by further limiting F, G as follows: The a:-axis is 
divided up into a number of disjoint intervals, equal to the total number of 
runs in the sequence V defined in connection with the Wald-Wolfowitz test in 
section 4. If the j-th run in F is a run of I’s the restriction G{x) = 0 in the 
j-th interval is imposed, if thej-th run is a run of 2’s, F(x) = 0 in thej-th inter- 
val. The intervals in which F, G are permitted to assign positive probability 
then correspond in order and number to the two kinds of runs. With this re- 
striction the (twice) modified likelihood ratio statistic is found to be 

ZLCi^logii/- logiiy!), 

»* i 

where i<y ia the number of elements in the j-th run of I’s (i = 1, 2). Large values 
are significant. For large samples the asymptotic distribution of the statistic 
falls out as a special case of a general theorem of Wolfowitz. 

In the same paper Wolfowitz obtained modified likelihood ratio tests for the 
problem of k samples and the problem of independence of two or more random 
variables. 

In his examples the author states that the maximizing Fn in Q’ is “essentially 
the same” as the maximizing F„ in Q, at least for the significant rankings R, 
and for large samples. The necessity of this approximation procedure is some- 
what disturbing, as is the restricftion to the method of ranks. Since it does 
not seem possible to give a definition of likelihood ratio tests sufficiently broad 
to include the non-parametric case, yet yielding the usual result in the parametric 
case, we are denied even the small comfort of saying that at least in special cases 
the method is known to jdeld optimum results. In some problems the set 
{ £.} of rankings, corresponding to the set [w,] of regions in W which serves to 
separate the s points of the subpopulations [E'] defined in section 2, is not 
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unique — consider for instance the problem of two samples when the populations 
are bivariate — and in such cases the method as defined above would not give a 
unique result. These remarks are intended to point the need for further in- 
vestigation and cannot detract from the ingenuity of the method — the first 
general process that has been suggested for choosing one out of the welter of 
similar regions yielded by the randomization method. 

13. Wald’s formulation of the general problem of statistical inference. A 
formulation of the general problem of statistical inference broad enough to cover 
the non-parametric case, and including estimation and tests as well as statistical 
problems classifiable under neither of these headings, has been given by Wald 
[37]. This formulation extends certain concepts he had applied earlier*® to the 
parametric case. 

In this last section we shall permit ourselves a somewhat more abstract ter- 
minology and notation than before. As in section 2, E = (Zi , ■ • • , Xn) will 
denote the sample; Fn(E), its c.d.f.; W, the n-dimensional Euclidean space of E, 
the sample space; and fi, the space of admissible . Of central importance is 
a given class © appropriate to the problem, © = {w^}, whose members are 
(not necessarily disjoint) subsets of 0, fl = • To every « © there con’e- 

sponds a hypothesis H (w^) : , so that there is a 1 : 1 correspondence between 

the members of the set © and those of the set {/f(w^)} of hypotheses. The 
general problem of statistical inference, according to Wald, is the choice of a 
decision function A(E) mapping W into ©. For every E tW & decision function 
A{E) uniquely selects an element ug of ©, ug = A{E). Its statistical import is 
that when the sample point E equals E, we agree to accept the hypothesis H (cog) 
determined by A{E) = ug . 

Before introducing any further definitions let us illustrate the preceding ones. 
In any problem of testing a hypothesis, the set © has just two membei’s ui and 
coj which we have heretofore denoted by « and ft — w, respectively. The de- 
cision function A{E) then takes on just these two values, in fact, A{E) = <02 
for E in the critical region w of the test, and A{E) ~ uifor E tW — w. 

To illustrate the definitions in the case of point estimation, consider estimating 
the median JW of a univariate population with c.d.f. F{x). ft would be the class 
of F„ of the form II“-i F{xi) with, say, F « ft 4 and F'(Af) 9 ^ 0 (which is sufficient 
to insure a unique M). The index 0 could now be identified with M, so that its 
domain is the real line, and <ag = {F« | ilf(F) = j8}. The classes ug would be 
disjoint in this case and each would contain an infinite number of F„ . The 
problem of estimating the unknown M may be said to be the choice of a decision 
function A{E): When E = F we accept H{ug):F„ tug = A(F), meaning in this 
case simply that we accept the statement that M equals the /3 determined by 
A(F). 

A. Wald, “Contributions to the theory of statistical estimation and testing hypoth- 
eses”, Anntda of Math. Stat., Vol. 10 (1939), pp. 299-326. 
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Suppose next that instead of the point estimation of M just discussed we are 
interested in the interval estimation of M. We define C as above, and now take 
the index 0 to consist of a pair a, b oi real numbers. An interval estimate a < 
M < b may be regarded as an acceptance of the hypothesis /f(wa,fc):F„ e Ua,i , 
where wa.6 is the subclass of S2 consisting of all Fn for which M (F) lies in the inter- 
val a < ilf < b. The set © now consists of all classes Wo,i> with — <» < o < 
b < -b 00 . Here as in the general case of interval estimation the classes wp of 
the set © are not disjoint. The decision function A(E) adopted in section 8 is 
A{E) = Utt.b with a — Zk,b = Zn-k+i , where Zi < Zi < • • • < z* is a rearrange- 
ment of the coordinates Xi , • • • , x„ of F. 

An example of a problem neither of estimation nor testing would be the fol- 
lowing: Let n be as above. Two real numbers A and B (A < B) are given and 
it is required to decide on the basis of the sample E to which of the three classes 
— 00 <M<A, A<M<B,B<M<-\-'^ the unknown median M belongs. 
Here the set © would consist of three disjoint classes wi , wj , «» : where wi is 
the subclass of S2 consisting of Fn with M(F) < A, etc. 

We return now to the general case. Before defining a “best” decision func- 
tion A = A*, Wald asks that there be a given weight function tt)(F„ , w^) defined 
on the product space X ©. The weight function ln(F„ , u$) is a real-valued 
function evaluating the loss involved in accepting Hiufi), the statement that the 
unknown c.d.f. of E is a member of ug , when the unknown c.d.f. is actually F„ . 
If F„ « we make no error in accepting //(wp), and in this case h) is defined to 
be zero. Its value otherwise is required to be non-negative. In this theory the 
choice of the weight function is regarded as essentially not a mathematical prob- 
lem, but the choice is to stem out of the very specific situation in which the 
statistical inference is to be made. In an industrial problem » might be the 
financial loss incurred when a certain kind of error is made. 

After h) is given, the decision functions A are to be restricted to the class for 
which Id(F« , A{E)) is a Borel-measurable function of E for all F„ t note that 
>D depends on E only through A, not through F„ . The expected value of to 
for a particular F„ is called the risk function; it depends of course on the decision 
function A and the weight function to as well as on F„ . Denote it by 

r(A, to 1 F„) = f »(F„ , A(E)) dF„(E). 

Jw 

Since the true F„ is unknown, so in general will be the true value of the risk 
function associated with a particular decision function A. We might call 

r(A, to) = sup r(A, to 1 F„) 

F,ta 

the maximum risk associated with the decision function A. Wald defines A* 
to be the “best” decision function relative to the weight function to if the maxi- 
mum risk r(A, to) is minimum for A = A*. He points out that the “best” decision 
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function might be defined as one which minimizes some weighted mean, taken 
over all c 12, of the risk function r(A, U) | Fn), but that the above definition of 
the *‘best*^ decision function has certain advantages. Thus under certain restric- 
tions on il and to, the risk function r(A*, to | Fn) is independent of Fn e 12, that is, 
we then know the exact value of the risk, regardless of what the true F„ may be. 
This is analogous to the desirable situations where confidence intervals arc 
known, and the probability of a false statement (to the effect that the unknown 
quantity is in a given region when it is not) is then a constant independent of 
the unknown quantity. 

Wald’s theory is suggestive and formally very satisfying, but one would like 
to see some specific examples of its application to non-paramctric cases. A 
discouraging aspect, not shared by the older Neyman-Pearson theory, lies in the 
very refinement that a decision function is declared best with respect to a very 
particular weight function to. An attractive possibility would be to impose a 
metric on il or on a related function space, and to let to be the distance function. 
In the problem of two samples for example, after metrizing 12;, , the weight to 
assigned to accepting H might be taken as the distance between F and G in the 
notation of section 4. A suitable choice of metric might yield a weight function 
appropriate to a large variety of situations. The difficulties of finding a distance 
function which is intuitively satisfactory and analytically tractable in calculat- 
ing the risk function are no doubt formidable. The device of metrizing a space 
of distribution functions was used by Mann and Wald in a different connection 
[17], but their choice of distance function, while appropriate to their problem, 
would not be satisfactory here. 

Also still lacking is any general theory relating the three concepts discussed in 
Part III. The following questions have been answered, at least for some specjific 
examples, in the parametric case, but are still untouched in the non-parametric 
case: Are likelihood ratio tests consistent? Is there a simple weight function tt> 
relative to which the likelihood ratio test becomes a “best” test, or asymptx)ti- 
cally a “best” test? If a test is “best” relative to a given weight function, with 
respect to what set of alternatives is it consistent? 

In conclusion let us emphasize the need for constructive methods of obtaining 
“good” and “best” tests and estimates in the non-parametric case. Recalling 
the history of the parametric case we may judge that half the battle was the 
definition of “good” and “best” statistical inference. Progress in the non- 
parametric case has been made in the direction of definition, mainly by carrying 
over or modifying criteria originally advanced for the parametric case. How- 
ever, besides criteria for “good” and “best” tests and estimates, we have in the 
parametric case a large body of constructive theory which may be applied in 
particular examples to yield the optimum tests or estimates; thus we have the 
Fisher theory of maximum likelihood statistics for point estimation, and the con- 
structive theorems of the Neyman-Pearson theory for the existence of critical 
regions of types A, Ai , JB, , and the related t3q)es of “best” confidence inter- 
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vals. The contrasting lack of any general constructive methods^* at present 
challenges us in the non-parametric theory. 
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ON THE THEORY OF SAMPLING FROM FINITE POPULATIONS 

By Morris H. Hansen and William N. Hurwitz 
Bureau of the Census 

I— HISTORICAL BASIS FOR MODERN SAMPLING THEORY 

The theory for independent random samphng of elements from a population 
where the unit of sampling and the unit of analysis coincide was developed by 
Bernoulli more than 200 years ago. The theory that would measure the gains 
to be had from introducing stratification into sampling was indicated by Poisson 
a century later. Subsequently, Lexis systematized previous work and provided 
the theoretical basis for sampling clusters of elements.^ The adaptation of the 
work of Bernoulli and Poisson to sampling from finite populations was sum- 
marized by Bowley in 1926 [1] approximately a century after the work of Poisson. 

An impetus to sampling advancement, following some fundamental statistical 
contributions of Pearson, Fisher, and others, resulted from the work of Neyman 
when he published his paper in 1934 on the two different aspects of the repre- 
sentative method [8]. In that paper he introduced new criteria of the optimum 
use of resources in sampling, including the concept of optimum allocation of 
sampling units to different strata subject to the restriction that the sample have 
a fixed total number of sampling units. 

If, no matter how a sample be drawn, the cost were dependent entirely on the 
number of elements included in the sample, there would be httle need for theory 
beyond the classical theories of Bernoulli and Poisson covering the independent 
random sampling of elements within strata, supplemented by the extension of 
the theory to finite populations, and the extension to optimum allocation of 
sampling units. Very often, however, in statistical investigations it is extremely 
costly, if not impossible, to carry out a plan of independent random sampling 
of elements in a population. ' Such sampling, in practice, requires that a listing 
identifying all the elements of the population be available, and frequentl}" this 
Usting does not exist or is too expensive to get. Even if such a listing is avail- * 
able, the enumeration costs may be excessive if the sample is too widespread. 
Frequently also, there are other restrictions on the sample design, such as the 
requirement that enumerators work under the close supervision of a limited 
number of supervisors, and as a consequence the field operations must be confined 
to a limited number of administrative centers. Techniques such as cluster 
sampling [2, 3, 4, 5, 6, 7, 8, 10], subsampling, and double sampling [9], have been 

^ The sampling of clusters of elements refers to the sampling of units that contain more 
than one element. Examples of cluster sampling include the use of the city block or the 
county as the sampling unit when the purpose of the survey is to determine the properties 
of the population made up of individual persons or individual households. In these in- 
stances, the city block or county is referred to as the cluster of elements, and the individual 
person or household is referred to as the element. 
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developed with the aim of making most effective use of available resources, while 
keeping within existing administrative restrictions, and thus producing the maxi- 
mum amount of information possible within these resources and restrictions. 
Neyman [8], Yates and Zacopanay [10], Cochran [2], Mahalanobis [7], and others 
have made important contributions in this regard. 

We can illustrate a number of the developments indicated above in a simple 
but fairly general subsampling design. This design involves the sampling of 
clusters of elements from a stratified population and the subsampling of elements 
from each of the selected clusters, where the number of elements in each of the 
primary sampling units within a stratum is the same. 

Suppose we have a population made up of L strata, with the 2 -th stratum con- 
taining Mi primary sampling units of Ni elements each. The individual element 
will be the subsampling unit. Let X^k be the value of some characteristic of the 
A;-th element of the j-th primary sampling unit in the f-th stratum, and assume 
that the character to be estimated is 


( 1 ) 


1 = E E 2 Xiik/E MiNi , 

i 


For example, if X is the average income per household in a given city, Xijk might 
be the income of the fc-th household in the jf-th city block in the 2 -th ward ; 
where the household is the subsampling unit, the city block is the primary 
sampling unit, and the stratification has been by wards. Suppose, further, that 
we sample vii primary units from the i-th stratum, and subsample Ui elements 
from each of the primary units sampled from that stratum. 

The “best linear unbiased estimate” [8] of X from the sample will be 


( 2 ) 


X' 


i MiTli 


i * » 


ind the variance of X' is 


:3) Vi' = E 
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vhere = Z X^k/Ni mdXi = 'E^ Xi,k/MiNi. 

k k i k 


These formulas have no practical utility in designing samples unless there are, 
n addition, some considerations of differential costs. Cost relationships some- 
ames may be stated explicitly as a function of the m,- and the n< , or, what is 
requently the case, they may be approximated sufficiently through intuition 
ind speculation to guide one to a reasonable decision among the various alter- 
mtives implied by the design. 
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If we know the cost function.we proceed to determine the values of the m,- and 
the Ui that make S' minimum for a fixed total expenditure, and also subject 
to any other restrictions that may be imposed. This theory provides a basis for 
determining the optimum allocation of the sampling ratios to the various strata, 
and to primary and secondary sampling units within each stratum. 

Such developments, however, must be regarded as only the first step in sample 
design. We cannot go forward if we only know that the optimum sample design 
is some particular mathematical function of the population parameters and the 
cost factors; we need also to know something about the relative magnitudes of 
certain parameters in the particular populations under consideration, as well as 
something about the costs associated with the various sampling and estimating 
operations. 

Thus, considerable work in recent years has been done on the study of the 
relative magnitudes of variances and covariances between and within various 
types of sampling units and on the study of costs and types of cost functions 
that operate. Work is being done in this field by the Department of Agriculture 
in connection with sampling for agricultural items, and is being done also in the 
Bureau of the Census, and in other places. 

11— THE DIRECTION OF MORE RECENT DEVELOPMENTS 

The sampling procedure indicated above involves as a first step the definition 
of the system of sampling, such as whether the sampling method will involve 
cluster sampling, double sampling, or subsampling, and along with this the 
definition of the stratification and the sampling units. The second step is that 
of determining the method of estimation, together with the allocation of the sam- 
pling units. 

The first step, that of defining the sampling system is taken with a view to 
administrative feasibility and sampling efficiency, but no simple procedure exists 
which leads one uniquely to the selection of a system except perhaps by the 
impractical method of listing and examining all possible alternatives and accept- 
ing one on some criterion of best. However, given the definition of a population 
character to be estimated, and a sampling system, a simple procedure is available , 
that will provide a unique solution to the second step providing we accept some ' 
criterion as to what “best” means, such as the best linear unbiased estimate, 
subject to any cost or administrative restrictions that may be imposed. Such 
criteria lead us to both our estimating procedure and our allocation of sampling 
within the sampling system defined. 

While no theory with practical applicability has been developed which indi- 
cates a “best” system of sampling, and at the same time indicates the “best” 
estimating procedure and sampling allocation, some progress in the choice of 
improved sampling systems and estimating procedures has been made. The 
developments in the following two directions appear to us to be particularly 
pertinent. 

1. Modifications in some of the fairly generally accepted criteria of good 
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sample estimates have led to more reliable sample results for some types of 
sampling systems (some of these are mentioned in Sec. Ill) ; 

2. Some principles are emerging, that have led to improved determination of 
the sampling units, the strata, and other aspects of the sampling system 
(some efforts at formulating such principles are reported in Secs. IV, V, and 
VI). 

We shall summarize, principally, some of the recent work in the Census— and 
in so doing shall mention some work of others that is closely related. Most of 
the work that we shall summarize relates to problems where the sampling units 
are clusters of elements and vary in size. 


Ill— MODIFICATIONS IN THE CRITERIA FOR GOOD ESTIMATES 


The estimate given in the general subsampling problem formulated in Sec. I 
satisfies the criterion of the ‘"best linear unbiased estimate.^’ Also, as far as our 
experience has indicated, this estimate is frequently the most efficient one for 
populations of the form described, that is, where the number of elements in each 
sampling unit within a stratum is the same. However, if the numbers of ele- 
ments differ between sampling units, a biased luit consistent estimate can fre- 
quently be found that has a substantially smaller mean square error^ than the 
best linear unbiased estimate. 

For example, consider the case where clusters of elements are the sampling units 

M M 

and we want to estimate -S' = ^ Ni , the average value per element 

i t 

of some specified characteristic. Here M is the number of sampling units in the 
population, X,- is the aggregate value of the specified character for all elements 
in the z-th cluster, and Ni is the number of elements in that cluster. The joint 

M 

distribution of and Ni is unknown, but 2 V* = V is known. Under these 

i 

circumstances the ^^best linear unbiased estimate” of X from a sample of m 


M 

clusters turns out to be ~ 2-] Xi/N, However, a smaller mean square error is 

m i 

often obtained by the use of a ratio estimate from the sample such as 


2 Ni . This estimate is excluded by the “best linear unbiased” cri- 

t t 

tenon because it is nonlinear and biased, although the bias is usually negligible 
and the estimate is consistent. Since the best linear unbiased estimate of X 
requires the knowledge of V, the sample ratio has a further advantage in that 
it can be used even when N is not known. 

A recent paper by Cochran [3] gives a number of consistent though biased esti- 


• In this paper the terms ‘‘mean square error’' and “variance” are used interchangeably 
to refer to E(X — X)^ when EX is equal to X, the population character to be estimated. 
When EX is not equal to X, however, E(,X — X)* will be referred to only as the “mean square 
error.” Since, under these latter circumstances, E{X — X)® « E(X — EX)^ + {EX — i)*, 
the mean square error is equal to the variance of X plus the contribution due to the bias. 
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mates of X that make use of the least square estimate of the linear regression of 
Xi on Ni . These estimates generally have a smaller mean square error than 
either the best unbiased linear estimate or the simple ratio estimate given above. 
However, they require knowledge of N, as does the best linear unbiased estimate, 
and in addition may require detailed tabulations and considerable clerical Work 
as a part of the estimating process. 

Both types of biased estimates mentioned above are consistent, and usually 
have a smaller mean square error than the best linear unbiased estimate for 
sampling systems in which the sampling units vary in size. Thus, improved 
sample estimates will be obtained by modifying the ^‘best linear unbiased 
estimate'* criterion to include estimates that arc nonlinear, consistent, but have 
a smaller mean square error than the best linear unbiased estimate. 

IV— IMPROVEMENTS IN THE SPECIFICATIONS OF 
SAMPLING SYSTEMS 

A great deal can be done to improve sampling designs through improved speci- 
fication of the sampling system even though one has only a limited knowledge of 
the manner in which the population is likely to bo made up, and no si)ecific 
information concerning the particular population parameters involved (see 
Sec. VI). 

!• The sizes of sampling units. A number of recent investigations have 
indicated the desirability, with costs considered, of keeping the size of cluster 
very small when clusters of elements are used as the sampling unit in field sur- 
veys [2, 5, 6, 7, 8]. It iwS important to point out, however, that this principle is 
not necessarily applicable to subsampling systems, and that the use of large 
clusters as the primary sampling units in a system involving subsampling may 
yield distinct gains over the use of smaller clusters without subsampling. More- 
over, one of the often recurring problems in large-scale studies is the designing of 
sample surveys within stringent administrative restrictions on the number of 
different locations in which operations can be carried on. Under such restric- 
tions a procedure commonly used is to choose a limited number of existing 
political units, such as counties, as the primary sampling units, and then to sub- 
sample units such as blocks, small rural areas, or households. Under the circum- 
stances, if the numbers of primary subsampling units to be included in the 
sample are assumed to be held constant, the use of larger primary sampling units 
than the existing political units would have the effect of decreasing the sampling 
variance. 

The advantage of using large primary units in subsampling is evident in the 
simple case when the original units, each having the same number of elements, 
are consolidated to form half as many enlarged primary units, each twice as large 
as the original units. The variance between the enlarged primary units will be 
olb = iai6(l + p), where cr?6 is the variance between the original primary units, 
and p is the correlation between the units that are paired. The correlation coeffi- 
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cient will be close to zero (exactly equal to —1/{M -■ 1 } , where M is the number 
of original primary units) if the pairing is done at random, and it follows that the 
variance between counties is then cut at least in half. Ordinarily, p will be 
greater than zero if the paired units are required to be contiguous.'* However, 
through choosing for consolidation those contiguous units that are as different 
as possible, p is made as small as possible, and in some instances this minimum 
value may even be negative. In any event, the smaller the value that p takes on, 
the gi'eater the reduction of the sampling variance between primary units from 
the use of enlarged units. While the sampling variance within primary units is 
increased by such consolidations, the increase is slight, and the total sampling 
variance is almost invariably decreased (see Appendix, Section 1). 

The restriction on extending the consolidation of primary units is introduced by 
the increased cost of subsampling within larger and larger areas. This increased 
cost is to be weighed against the decreased variance. If the cost restriction 
were not sufficiently severe, consolidation would proceed to the point of eliminat- 
ing the use of primary sampling units altogether, and the subsampling units 
would be selected independently throughout the entire stratum. 

2. Subsampling where the primary units are of unequal size. Use of proba- 
bility proportionate to size in subsampling. A subsampling system frequently 
followed, whether or not the primary sampling units vary in size, involves the 
selection of one or more primary units from each stratum with the probability 
of selection the same for each primary unit in the stratum, and the subsampling 
.of a fixed proportion of the subsampling units from the selected primary unit. 
When the primary units vary in size this subsampling system has some ad- 
ministrative disadvantages that arise because the number of subsampling units 
to be included in the sample will vary with the number of elements in the se- 
lected primary unit. (The term “size” of sampling unit as used in this paper 
refers to the number of elements in the sampling unit.) 

The disadvantages in the above system have led in some instances to the speci- 
fication of a second subsampling system in which, although the primary units 
were selected with equal probability, the subsampling has been of a constant 
number rather than of a constant proportion. 

A third subsampling system that can be recommended over both the above 
systems is to make the probability of selection of a primary unit proportionate 
to its size and then to subsample a constant number of subsampling units. 

We shall assume that for all thjiee systems only one primary unit is selected 
from each stratum. Stratification to this degree leads to a smaller sampling 
variance than does less extensive stratification. For simplicity in making com- 
parisons, we shall assume, furthermore, that the subsampling unit is the element 
of analysis and that the sample estimate used is of the form Si' ^ ZNhSk/'SNh 
where Sh is the sample average, for the hrth stratum, of the character being 
estimated, and Nh is the size of that stratum. This estimate, which is frequently 
used, is biased for the first two systems but unbiased for the recommended sys- 
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tern. However, an unbiased estimate, say the “best’’ linear unbiased estimate 
for the first two systems generally has a much larger mean square error than the 
biased estimates used in these comparisons and hence has not been considered in 
the comparisons which follow (see Sec. VII, footnote 7). 

The first two subsampling systems mentioned are about equally efficient when 
the number of subsampling units drawn from each primary unit is reasonably 
large, but each will usually have a larger mean square error than will the recom- 
mended system. The difference between the mean square errors of either of the 
first two and that of the recommended design is given approximately by 

(4) ~ Z [Z P>.,Nh - Z 

where, within the /i-th stratum, Nhjis the number of elements in the j-th primary 
sampling unit, Nh is the average size of primary sampling unit, Qh is the number 
of primary sampling units, phj is the intra-class correlation between elements 
within the j-th unit and al is the variance between individual elements within 
the stratum; L is the number of strata. (See Section 2 of the Appendix for the 
development of this difference.) 

This difference, which is a multiple of the average covariance between the 
Nhj and phj , will be positive if Nkj and phj are negatively correlated, and this is 
exactly the situation that exists in most practical problems we have encountered 
in sampling for social and economic statistics (see Sec. VI). 

The reduction in the mean square error arises because the recommended de- 
sign provides a more nearly optimum allocation of sampling as between large 
and small sampling units than do the other two. It might be possible, of course, 
as another alternative, to stratify the primary units by size and then allocate 
sampling to the various strata on the ba.sis of optimum sampling considerations. 
However, this would mean that some other and perhaps more important modes 
of stratification would be sacrificed, and moreover, the optimum allocation of 
sampling between the larger and smaller units could only be guessed at in most 
practical problems. Furthermore, it usually is not possible to stratify on size 
to the point that there is no variation in the sizes of units within a stratum. 

The sample estimate from the recommended system is unbiased whereas the 
estimates from the other two are usually biased, and sometimes fairly seriously 
so. (For a proof of this statement see Appendix, Section 1, and see also Sec. 
VII for a numerical illustration.) 

The use of probability proportionate to size serves to decrease only the sam- 
pling variation between primary units and has very little effect on the sam- 
pling variance within. Therefore, the recommended design shows its greatest 
advantage over the two alternatives when the contribution of the mean square 
error between primary units to the total mean square error is large. 

Ordinarily, the actual sizes of the primary sampling units will not be known, 
but numbers may be known that are highly correlated with the sizes. For 
example, ordinarily we will not know the populations of blocks or of cities or 
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counties at the time a sample is taken, but we may know their populations at the 
preceding census. Under these circumstances the primary units may be sampled 
with probabilities proportionate to the previously known (or their estimated) 
sizes, but if this is done the subsampling is to be modified in order to take account 
of the changes in the sizes between the two dates. If the actual sizes are known, 
the constant number taken from the selected primary unit in the h-th stratum is 
rih = thNh where th is the sampling ratio assigned to the stratum, and Nh is the 
total number of elements in the stratum. The subsampling ratio within the 
selected primary unit, therefore, is thNh/Nhj, where Nhj is the number of ele- 
ments in the selected unit. On the other hand, if there is available only a meas- 
ure of size Phj highly correlated with the actual sizes of the units Nhj and, if the 
probability of selection of the primary unit has been proportionate to the PHi 
the subsampling ratio in the selected primary unit will be equal to thPh/Phj , 
where Ph is the measure of size of the entire stratum, and Phj is the measure of 
size of the selected primary unit. The variance of a sample estimate where 
measures of size are used is given subsequently in this paper (see Eq. (9)). 

3. The use of area substratification within primary strata in a subsampling 
system. Another modification, which will be called area substratification 
within primary strata, may be particularly useful where a relatively small sample 
is required from a population covering a large area, and where operations must 
be confined to a limited number of centers. 

Some prehminary remarks are necessary before area substratification can be 
explained. Area substratification requires (a) that the entire population to be 
sampled be divided into areas that will serve as primary sampling units; (b) that 
these units be further subdivided into a number of sub-areas; and (c) that certain 
summary statistical information be available for each of the sub-areas in advance 
of drawing the sample. The information that must be known for the sub-areas 
includes a reasonably good measure of their sizes (perhaps the total population, 
total dwelling units, or total farms) and other information which is indicative of 
the characteristics of the area, such as whether predominantly farm or nonfarm, 
predominantly white or colored, etc. The sub-areas, when grouped into homo- 
geneous classes, will serve only to determine the substrata described subse- 
quently, and will not ordinarily serve as the subsampling units, whi?h may be 
defined independent of the sub-areas. 

- The definition of the primary sampling units and the classification of them 
into strata proceed as indicated earlier, with the primary units made as internally 
heterogeneous as possible within strata that are as homogeneous as possible. It 
will be assumed that only one primary unit is sampled from each stratum, and 
that the probability of selecting the j~th primary unit within the h-th stratum is 
proportionate to , where P^y is the measure of size of the primary unit and is 
equal to the sum of the measures of size of the sub-areas that it contains. It will 
be assumed, also, that- h , the over-all sampling ratio to be used within the hrth 
stratum, has been determined for all strata on the basis of considerations of 
optimum allocation. 
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The introduction of area substratification within primary strata may then be 
accomplished as follows : 

(a) The sub-areas within jeach primary stratum are classified into substrata 
on the basis of their characteristics, (^or example, they may be classified 
into predominantly farm and predominantly nonfarm sub-areas, and 
these further classified on the basis of the average size of farm or average 
rental value of the dwelling units. In such a case, the sub-areas within 
the primary stratum that are predominantly farm and that have average 
rental values lying within a specified interval constitute a substratum.) 

(b) The sub-areas within the prilnary unit selected from each primary stratum 
are classified into the same substrata. 

(c) Subsampling units are defined within each of the substrata within the 
selected primary units. The number of subsampling units defined within 
that part of the i-th substratum that is contained within the j-th primary 
unit is denoted by Mhij . (Various types of subsampling units may be 
defined, such as the individual person, farm, dwelling unit, or structure, a 
very small area, etc. The subsampling units need be defined only within 
the selected primary sampling units.) 

(d) The number of subsampling units to be included in the sample from the 
i-th substratum within the selected (j-th) primary sampling unit is 

(fi) ^ MkijthPhi/Phij y 

where Phij is the measure of size of that part of the i-th substratum that 

lies within the j-th primary unit, and Pm = Phij is the sum of the 

j 

measures of size of the sub-areas contained in the i-th substratum of the 
/i-th primary stratum. This method of allocating the subsampling pro- 
vides that the subsample drawn from the selected primary unit is repre- 
sentative, so far as possible, of the entire stratum, rather than of the par- 
ticular primary unit that happens to be included in the sample from that 
stratum. To illustrate, suppose the numbers of persons in sub-areas from 
the 1940 census are used as the measures of their sizes, and that the sub- 
areas are classified into substrata on the basis of their characteristics in 
194(^%s indicated by the 1940 Decennial Census of Population. The 
allSmtion of the subsainpling indicated above then provides that if the 
proportion of the total population residing in sub-areas that are pre- 
dominantly farm is 30 percent, the sample will be drawn in such a mannef 
that 30 percent of the 1940 population expected in the sample would be 
from the predominantly farm sub-areas, even though, in the selected 
primary sampling unit, perhaps only 15 percent of the 1940 population 
might reside in such areas. 

(e) The population character to be estimated is 

^ 

X = z 1: r £ 

h i j k 


( 6 ) 
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where Za. j* is the aggregate value of a specified characteristic for all of the 
elements contained within the fc-th subsampling unit in the f-th substra- 
tum of the j-th primary unit; Sh is the number of substrata and Q* is the 
number of primary units in the h-th primar}'^ stratum; and L is the number 
of primary strata. (X might be the total number of workers in the 
United States, or the total number of farm laborers, etc.) An estimate of 
X from the sample is 

‘ A 'W 

(7) X' = Zl/&l: L X«y*. 

A i k 

No summation over j is involved, because only one primary unit is drawn 
from the i^th stratum. This is a very simple estimate, involving a sum 
weighted only at the primary strata level. If the U are all set equal to 
t, i.e., if a constant proportion is sampled from each stratum, the estimate 
becomes merely the total number of elements in the sample having the 
specified characteristic multiplied by l/t, the reciprocal of the sampling 
ratio. 

The allocation of the subsampling indicated above may be deviated from and 
the controls of area substratification can still be maintained if proper modifica- 
tions are made in the sample estimate. In this event, differential weighting 
must be introduced at the substrata level rather than only for the primary strata. 

The definition of heterogeneous primaiy sampling units, the proper classifica- 
tion of them into strata, and the use of probabilities proportionate to the meas- 
ures of size in the selection of the primary units are particularly desirable if area 
substratification is used. If these are not introduced the likelihood of making 
substantial gains through the use of area substratification is decreased. The 
definition of the primary strata should be made in conjunction with the definition 
of the substrata, and should insure that each primary unit has adequate repre- 
sentation of each substratum that is to be defined within that primary stratum. 
With this restriction observed, the number of significant substrata that can be 
defined will be limited by the heterogeneity of the primary units. Thus, in 
order to provide for substratification into predominantly farm and predomi- 
nantly nonfarm areas, the primary sampling units should be defined so that both 
farm and nonfarm areas are represented in each unit. This procedure not only 
makes area substratification more effective, but improves the efficiency of the 
sample in making separate estimates for such classes of the population. How- 
ever, if this procedure cannot be ^hered to exactly in practice, primary units in 
which certain of the substrata are' not represented will occasionally come into the 
sample. One alternative when this occurs is to combine certain substrata; 
another is to exclude such primary units from the sample. 

Since the number of primary strata is restricted by the number of primary 
units to be sample, it is wasteful to set up strata at the primary level with re- 
spect to sources of variation that can be controlled adequately through area 
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substratification. For example, if farm areas and nonfarm areas are to be dis- 
tinguished in the substrata, the primary strata should not be exhausted by classi- 
fying the primary units into a large number of strata by percent farm (percent 
of total population in primary unit living on farms), since the effect of the sub- 
stratification is to control the variation in the percentage farm. Limiting the 
number of percentage farm classes at the primary level makes possible the use 
of other modes of stratification that will control on farm type, or on the indus- 
trial character of the nonfarm population, or on some other similar criteria. 

Area substratification is to be distinguished from the fairly commonly used 
method of specifying the number of elements to come into the sample from each 
of several different classes of elements — ^whether such quotas are fixed to make the 
sample correspond with the specified characteristics of the entire primary stra- 
tum or of the selected primary sampling unit. The method of fixing quotas and 
instructing interviewers or enumerators to obtain a given number of elements 
(persons, dwelling units, farms, voters, etc.) having various specified charac- 
teristics has a fundamental weakness that is avoided in area substratification 
within primary strata. Such quotas ordinarily must be set on the basis of pre- 
nous information or rough estimates, and thus cannot accurately reveal chang- 
ing characteristics of the population. Area substratification, on the other hand, 
uses previous information to insure the proper representation of various types of 
areas in the sample. The numbers of elements obtained with various specified 
characteristics are determined from the population as it is, and not as it was at 
some previous datu. In times of rapid change the fixing of (piotas on the basis of 
previous information may introduce increasingly serious biases. 

The gain from using previously available information in stratifying areas 
arises from the fact that there is a high correlation in the characteristic of an 
area from time to time over a period of several years. An area that is pre- 
dominantly farm at one date ordinarily will be predominantly farm a few years 
later. Similarly, while very substantial shifts in population may occur, the num- 
bers of persons in a set of areag at one time ordinarily will be very highly corre- 
lated with the numbers a few years later. However, area substratification does 
not depend on the fact that no shifts occur. If shifts have occurred it will 
measure them. If the shifts have been sufficient to completely alter the charac- 
ter of most small areas, it will still provide estimates revealing the changing 
character of the population, but under these circumstances the efficiency of the 
method is decreased. 

V— EXPECTED VALUES AND VARIANCES FOR THE SUBSAMPLING 
SYSTEM INCORPORATING THE PRINCIPLES OUTLINED ABOVE 

The system of sampling incorporating the principles of enlarged primary 
units, the selection of primary units with probabilities proportionate to the 
measures of size and area substratification will be examined more fully below. 
It will be referred to, for convenience, as the specified subsampUng system. 
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1. The expected value of an estimated total for the specified subsampling 
system. All summations in the formulas below are over the population unless 
otherwise indicated. The expected value of X' as defined in Eq. (7) is 

EX' 

h i i k 

From (5) = mhuPhij/MhuPhi , and therefore 

EX' = ZZZZ(A.-/PHy)(F»y/P»)XA.yt 

h ]' % k 

= Z P» Z Z {PKi/PK){PHi/PK){XKii/PHi) = Z P*P»U) 

h i % 

where 

Ph — Phi == Phj\ RhiA) = ^ (Phj/Ph)Rh3U)t 

i 3 3 

Rhj{A) ~ (,Phi/ Ph^Rhiji and RhiJ ” ^hijk/Phij “ ^hij/Phij* 

i k 

The Rhj(A) will be referred to as the adjusted ratio for the j-th primary unit. 
It is the weighted average within the j-th unit of the substrata ratios, Rhij , 
where the same set of weights Phi is applied to the Rhij in each primary unit 
within a stratum. The R^a) is the average, within the /i-th stratum of the 
adjusted ratios. Hence 

(8) EX' ^ X + T, PkiRhiA) - Rh), 

h 

where 

Rh = Xh/Ph f with Xfc = 22 Xhii 3 

i i 

is the ratio of the aggregate value of the specified characteristic for the elements 
in the fe-th stratum to the measure of size of that stratum, and where the popula- 
tion character being estimated (6), is equal to X = = '^PhRk . 

From (8), it is seen that X' is a biased estimate of X, although ordinarily, in 
practice, only slightly so. The bias, equal to ^Ph{Rh{A) — Rh)^ is the sum of the 
biases for the various primary strata. Under many practical circumstances 
some of these will be slightly negative and some slightly positive, with the result 
that the total bias will be relatively small. The bias would be nonexistent if 
area substratification were not used, or if the form of the sample estimate were 
properly modified, but here again, as in the .case of substituting biased for un- 
biased estimates discussed in Sec. Til, the introduction of a slight bias may result 
in a substantial reduction in the variance. 

A suflScient, although not necessary, condition for the sample estimate (7) 
with area substratification to be unbiased is for the ratios Phij/Phj to be un- 
correlated with the Rhij within each substratum. Under these circumstances 
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and therefore 


p _ V V p _ V V p _ p 

» / Jr A i i Th JLh 


To illustrate, if the measures of size are the 1940 populations, then the sample 
estimate will be unbiased if the proportions of the 1940 populations of the pri- 
mary sampling units that are in the various substrata are uncorrelated with the 
corresponding Rhu . As indicated earlier these conditions are approximated in 
many practical problems, especially if the primary stratification has been carried 
out effectively. Moreover, if the conditions are not met approximately, the 
bias introduced may still be very small. (See Sec. VII for a numerical illus- 
tration.) 


2. The mean square error of an estimated total for the specified subsampling 
system. For the development of the mean square error of X' for the specified 
subsampling system, see the Appendix, Section 2. There it is shown that the 
mean square error of X' is 
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is the variance between subsampling units within a substratum of the aggregate 
value of a specified characteristic for the subsampling unit and 

Phij ~ Phi)/ ^^hij 

is the average measure of size of the subsampling units in the /i-i-j-th area. 

The first term of (9) is the contribution of the variance between subsampling 
units and may be kept small by proper definition of the subsampling units, and, 
of course, by increasing the subsampling ratio. The second term of (9) is the 
contribution of the variance between primary sampling units within strata; 
and the third term is the contribution of the bias, which, as indicated before, 
ordinarily will be of negligible size, so that the mean square error and the vari- 
ance will be approximately equal. 

It is the variance between primary sampling units that contributes most 
heavily to the total variance in many subsampling situations, and it is on this 
^contribution that the modifications proposed in this paper have their principal 
effect. The effect of area substratification is seen by comparing the variance 
between primary units given above with that obtained if area substratification 
were not used but other aspects of the design remained unchanged. In this 
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event the variance between primary units involves the variance of the ratio, 
Rhi = £ Xkij/Ph} = Xkj/Pkj, instead of the variance of the adjusted ratio, 

i 

Rhju) • 

The relationship between the variance of Rhj and that of RhjCA) within the 
h-th primary stratum is given by 

where o'L,*-«a,(a) is the variance of the difference between the adjusted and the 
unadjusted ratios, and p is the correlation between the adjusted ratio and the 
amount of the adjustment. Thus, if the correlation is near zero or positive, 
there will be a gain from the introduction of area substratification, although there 
may be a loss if the correlation is highly negative. Essentially, the condition 
for p being equal to or near zero is the same as that for the sample estimate being 
unbiased; namely, that the PhnlPhj be uncorrelated or only slightlj^ correlated 
with the Rhij within each substratum.^ 

The variance of Rh^A) rather than that of Rkj occurs in the variance of X' 
because the subsampling numbers were allocated proportionate to the Pm , 
no matter what primary sampling unit happened to be selected for inclusion in 
the sample. The ratio Rhj like RhjU) niay be regarded as the weighted average 
of the Rhij but with the weights equal to Phij instead of Phi , and thus varying 
from primary unit to primary unit. It would appear, therefore, from the rela- 
tionship of the variances given above, that if the substrata are effective, and if 
the Phij are highly correlated with the actual sizes of the substrata, the weighted 
.average using fixed weights in all primary units should have a considerably 
smaller variance than that using variable weights. This turns out to be the 
case in many practical situations, some illustrations of which will be given later 
(see Sec. VII). 

3. The mean sqiuure error of ratio estimates for the specified subsampling 
system. The need for estimating a ratio from a sample arises in two cases; 
first, when the ratio is the population character for which an estimate is desired, 
and second, when the application of a ratio from the sample to a known total 
uses additional available information for obtaining an improved estimate of the 
desired total. 

Ratio estimates are desired as an end-result when, for example, the change in 
a characteristic from one time to another is being considered. Thus, if Y' is 
the estimated total income of farm workers at one date, and X' the corresponding 
estimated total income at a second date, then r' = X'lY' is an estimate of the 
relative change in the total income of farm workers over the period of time 
covered. Similarly, the estimate of a percentage such as the percentage of the 

* Actually, a sufficient, although not necessary, condition for p to be equal to zero is that 
Phij /Phi be uncorrelated with both the ratio Rhu and the cross-product Rhu Rkgi for all 
pairs of substrata. 
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workers unemployed will involve the ratio of two random variables from the 
sample. Ratio estimates from a sample may be particularly useful in instances 
wher§ the reliability of the ratio estimate is greater than the reliability of the 
estimate of either the numerator or the denominator, as is frequently the case. 

Ratio estimates may be used as a means of obtaining an estimated aggregate 
value of a specified characteristic, if F, the aggregate value of a second charac- 
teristic highly correlated with X is known exactly from independent sources, and 
X' and F', estimates of X and F respectively, are available from the sample. 
Thus 


( 11 ) 


X" = [X'XFIF = /F 


is an estimate of the aggregate value of the specified characteristic. If the corre- 
lation, in successive samples, between X' and F' is sufficiently high, the ratio 
estimate will be a more efficient estimate of X than will X', the simple estimated 
total given earlier (7); but X' will prove the more reliable estimate when the 
correlation is low.^ Thus, X", when the correlation between X' and F' is suffi- 
ciently high, makes use of more of the relevant available information for esti- 
mating X than does X'. 

The application of ratio estimates to the specified subsampling system is 
considered below. 

(a) The estimated ratio and its mean square error. The estimate of the popula- 
tion ratio r = X/F is: 


( 12 ) 
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where X' is given in (7) above, and F' is a similar estimate of the total value of 
a second characteristic. The mean square error of r' is approximately 
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* The variance of the ratio of random variables of the form r' « X* lY* is approximately 
orj, ■■ + Vyt — 2pjf'y'F;f'FyO where V indicates the coefficient of variation of the 

variable designated by the subscript, and Px'r' is the correlation. Hence, if PxV' i^ <9^- 
ciently large 7^ will be less than Vx' . The size of p^'y' required depends on the relative 
magnitudes of the coefficients of variation of X* and F^ 
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where 

Xhiik 

Yhiih 

Yhh 

2 

(ThiiiY 


RhiW.Y 

Tf*hiik 

fhHA) 


and the remaining symbols are as defined in the sections above where the ex- 
, pected value and variance of X' are given. 

The first and third terms of (13) are, ordinarily, the principal contributing 
terms. The second and fourth terms contain contributions due to the variation 
between the means of the substrata and the primary strata respectively even 
though the sample was stratified with respect to these classes. In some in- 
stances, the contributions of these terms will be important. The between 
strata contributions arise because the primary and subsampling units vary in 
size with respect to the character 7. 

This formula for the mean square error of a ratio is approximately equal to the 
one more commonly used given in footnote 4. The two formulas, both of which 
are approximations, would be identical if certain terms which are ordinarily 
negligible were retained in (13). This latter formula has the advantage of indi- 
cating the effect of different aspects of the design of the sample on the variance 
of the ratio. The derivation of t^is approximate variance formula is given in 
the Appendix, Section 3, together with an indication of the accuracy of the 
approximation. ® 

(b) The estimated totals and their mean square errors. As mentioned earlier, 
two estimates of X, the aggregate value of a given characteristic for all ele- 
ments are X' (7), and X" (11). The mean square error of X' is given by (9) 
and that of X” is simply equal to , where al’ is given approximately by (13). 


= the aggregate value of a specified characteristic for the elements in 
the &-th subsampling unit within the h-i-j-th area, for which a total 
is to be estimated; 

= the aggregate value of a second specified characteristic for the ele- 
ments in the same subsampling unit, and for which the total in 
the population is known; 

= E , and 7* = E Z Yhh . 

k i i 

E (Xm - 7«,)* 


MhM 


~ ^ Ph Kh 
Xkijk 


is the variance of the sampling units in the h-i-j-th. 
area with respect to the second characteristic, 
and Ykij ” Yktj / . 

is the adjusted average of the 7 m,- , and 


rui = 


= 


_ Phj(A) 

Rkj<A):r 


and fku) = 


etc., are the ratios of the X to the 7 for the 
areas indicated by the subscripts,' and 

Rh(A) 


Rk(A)-.i 


are the ratios of the adjusted 
ratios for X and 7 indicated by 
the subscripts; 
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The decision as to whether to use X' or X" as an estimate of X depends, of 
course, in the first instance, on whether Y is known, and in the second instance, 
on the relative magnitudes of the respective mean square errors given in (9) 
and (13). These may be approximated from prior knowledge concerning the 
relationships in the population under investigation, or they may be estimated 
from preliminary sample investigations. However, in instances where there is 
a positive correlation between the Xhuk and the Yhijk within substrata, it is fairly 
safe to assume that if the information necessary for the ratio estimate is avail- 
able, there will be little to lose and possibly considerable to gain from its use. 

The use of (11) instead of (7) is often desirable when Y in (11) is the aggregate 
value of the actual sizes of the primary units, and 7' is its estimate. This is 
particularly so if the measures of size used are not fairly precise measures of the 
actual sizes, and if, at the same time, the actual size is highly correlated with 
the character being estimated, in which case the use of ratio estimates will yield 
gains in both the between primary unit contribution and the within primary unit 
variance. (See Sec. VII for numerical illustrations.) However, if the measures 
of size are identical with the actual sizes (i.e., Phijk = Yhijk) the last two terms of 
(13) are identical with the between primary unit contribution to the variance of 
X' (9), and only the within primary unit variance is affected by the ratio estimate. 

While it is fairly safe in practice, if 7 is known, to make use of X" instead of 
X' as the estimate of X, some care must be exercised to make sure that the 
Xhijk has at least a moderately high average correlation with the Yhijk , where 
the correlations considered are those within substrata within primary sampling 
units. If this correlation is low, and if the size of the subsampling unit varies 
considerably, the ratio estimate may be considerably less efficient than the simple 
total estimate. On the other hand, if the measures of size of the various sub- 
strata and of the primary sampling units are fairly close measures of the actual 
size, and if the subsampling units have been carefully defined so that they do 
not vary too greatly in size, the two estimates are likely to have about the same 
efficiency. 

VI— SOME PHYSICAL PROPERTIES OF FREQUENTLY OCCURRING 
POPULATIONS THAT ARE BASIC TO THE SAMPLING 
PRINCIPLES RECOMMENDED IN THIS PAPER 

Many actual populations are characterized by the following physical proper- 
ties: 

(i) The elements within a cluster are positively correlated with regard to a 
specified characteristic. 

(ii) Clusters containing large numbers of elements have greater internal hetero- 
geneity than clusters containing small numbers of elements. 

(iii) Increasing the size of the cluster brings in correlated elements (e.g., in popu- 
lation or agriculture surveys larger clusters are formed by including house- 
holds or farms in ad j acent areas) . 
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The first of these properties is recognized implicitly in the literature where the 
losses of efficiency through the use of large clusters as sampling units are fre- 
quently cited. In our experience the second and third properties hold just as 
commonly in actual populations, and ordinarily for the same populations for 
which the first property holds. 

The presence of these physical properties in combination within strata leads 
to the following mathematical relationships that have been used throughout 
this paper: 

(a) The sizes of the primary sampling units, Nhj , are negatively correlated 
with the phj, the intra-class correlations between elements within the 
units; 

(b) The Nhj and NhjPhj are positively correlated ; 

(c) The Nhj and alj are positively correlated ; 

(d) The Nhj and alj/Nhj are negatively correlated. 

The use of these relationships has determined most of the choices among 
alternative procedures throughout this paper. The relationships, of course, do 
not necessarily hold, and exceptions to them can be found [5]. The frequent 
occurrence of populations characterized by such properties justifies further re- 
search on the more effective use of these and other properties that may be found 
to hold, 

VII— SOME APPLICATIONS OF THE PRINCIPLES DESCRIBED 
IN THIS PAPER TO AN ACTUAL SAMPLING PROBLEM 

The analyses summarized below were carried out for the purpose of deciding 
’ between alternative sampling procedures in the revision of a monthly national 
sample for labor force and other characteristics. Budgetary and administrative 
restrictions made it necessary to confine the field operations to a limited number 
of administrative centers scattered over the country, from which a sample of 
less than one-tenth of one percent of the population of the United States was 
to be drawn. 

The original sample (the one to be revised) was of a usual subsampling design 
in which counties were used as the primary sampling units, and households or 
small clusters of households were used as the subsampling units. In the revised 
sample contiguous counties were combined wherever administratively feasible, 
to form more heterogeneous primary units than the individual counties. Ap- 
proximately 2000 primary sampling units were formed from the 3000 counties in 
the United States. The combinations of counties, the primary stratification, 
the area substratification, and the measures of size, were determined on the basis 
of 1940 Decennial Census data together with more recent data where available.® 

The applications of the various principles suggested in this paper have been 

^ See [11] for a full description of the proposed revised sample, including an outline of the 
criteria of stratification used. That paper may be useful as a simple description of an 
application of the specified subsampling system. 
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evaluated by estimating 1930 Census labor force characteristics from a sample 
that was stratified on the basis of 1940 and more recent data. This constituted 
a particularly severe test of some of the methods, because of the substantial 
shifts that had taken place during the lO-j’^ear interval between 1930 and 1940. 

The analyses to be summarized in this section are concerned primarily with 
the gains obtainable under favorable circumstances by the introduction of three 
sampling principles; namely, 

(1) enlarged primary units (see Sec. IV-1); 

(2) the sampling of primary units with probability proportionate to measures 
of their size (see Sec. IV-2); 

(3) area substratification (see Sec. IV-3). 

Some comparisons are also given to illustrate the effect of using alternative 
sample estimating formulas. Computations have been made for six of the prin- 
cipal items that are currently being included in a monthly report of the labor 
force; namely, total numbers of male and female workers, total numbers of male 
and female agricultural workers, and total numbers of male and female non- 
agricultural workers. The comparisons between alternative systems have been 
made holding constant both the primary stratification criteria and the expected 
numbers of persons to be drawn into the sample. 

The percentage gains given below are the reductions in the between primarj’^ 
unit contributions (which include the bias contributions) to the mean square 
error.® Except where otherwise specified, the sample estimate used is given 
by (7). 

1. Gains obtained by introducing enlarged primary units. The gains obtained 
by using enlarged primary units are calculated by comparing the mean square 
errors arising from the sampling design in which individual counties are primary 
units with the mean square errors arising from the design in which combinations 
of counties are the primary units. In both designs, the primary units are drawn 
with equal probabilities and no area substratification is used. For this compari- 
son, preliminary computations have been completed for only a limited number 
of strata and for two of the labor force items given above; namely, total male 
workers and total female workers. The reduction in the sampling errors ob- 
tained by introducing enlarged primary units is estimated to be 48 per cent for 
total male workers and 26 per cent for total female workers. 

2. Further gains obtained by introducing probability proportionate to measures 
of size. The further gains obtained by using the principle of sampling with 
probability proportionate to measures of size are calculated by comparing the 
mean square errors arising from the design in which the units are drawn with 

* The contribution of the variance within the primary units to the total mean square error 
was relatively small in all instances) and practically unaffected by the introduction of the 
various principles. 
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equal probability with the mean square errors arising from the design in which 
the units are drawn with probability proportionate to measures of size. In 
both the designs, the primary units are combinations of counties, and in neither 
of them is area substratification used. The estimated per cent gains are as 
follows: 


Total Workers Agricultural Workers Nonagricultural Workers 
Male Female Male Female Male Female 

50 8 77 6 19 21 

The gains reflect both decreases in the sampling variance and the elimination 
of the bias which arises when the primary units are drawn with equal prob- 
abilities.^ 

3. Further gains obtained by introducing area substratification. The further 
gains obtained by using the principle of area substratification are calculated by 
comparing the mean square errors for the design in which area substratification 
is not used, with those for the design in which area substratification is introduced. 
In both these designs the primar}^ units are combinations of counties, and are 
drawn with probability of selection proportionate to measures of their sizes. 
The estimated per cent gains are as follows: 

Total Workers Agricultural Workers Nonagricultural Workers 

Male Female Male Female Male Female 

6 31 46 51 32 22 

4. Gains obtained by the integration of the above principles into a single sub- 
sampling system (the specified subsampling system). The gains obtained by 
Uvsing all three principles are calculated by comparing the mean square errors for 
the specified subsampling system (in which all three principles are used) with 
the mean square errors for the system in which none of these principles is used. 
In the specified subsampling system, combined counties are the primary units, 
the primary units are drawn with probability proportionate to measures of their 
size, and area substratification is used. In the other system, the primary units 
are individual counties, the sampling is done with equal probabilities and area 
substratification is not used. Preliminary computations for this comparison 
are available for only 2 of the 6 labor force items; namely, total male and total 
female workers. The estimated gains were 76 per cent for male workers and 53 
per cent for female workers. 

■ \ 

^ As indicated before, estimate (7) is used in both designs compared above. This esti- 
mate is unbiased for the design in which the primary units are drawn with probability pro- 
portionate to measures of size, but is biased for the design in which they are drawn with 
equal probabilities. However, for the latter design, the biased estimate is usually much 
more efficient than the best linear unbiased estimate. For the six labor force items, the 
best linear unbiased estimate gives rise to variances that are several times as large as the 
mean square errors for the biased estimate. 
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Calculations are available for all 6 items to measure the gains obtained by 
the use of the last two of the principles in combination; namely, probability 
proportionate to measures of size and area substratificatioii. For measuring 
these gains, the systems arc as described above, except that in both designs the 
primary units are combinations of counties. The estimated per cent gains are 
as follows: 


Total Workers Agricultural Workers Nonagr {cultural Workers 

Male Female Male Female Male Female 

54^ 37 88 54 45 39 

While both the specified subsampling system and the alternative to which it was 
just compared are biased designs, the bias in the specified system is appreciably 
smaller than the bias in the latter. For example, while the bias of the specified 
system in the estimation of total male workers was less than one-half per cent 
of the true total male workers, the bias for the alternative design in the estima- 
tion of the same population character was more than one and one-half per cent. 

5. The choice of estimate to use with the specified subsampling system. The 

simple estimate (7) given for the specified subsampling system may be improved 
on by the use of regression techniques (sec Sec. III). However, such techniques 
may require a great deal of clerical work, so that they frequently cannot be used 
in practice. As indicated in the last part of Sec. V, however, if certain inde- 
pendent information such as a knowledge of the total population is available, a 
simple ratio estimate of the form of (12) may sometimes introduce gains over 
(7). The use of the ratio estimate may be particularly desirable when the 
correlation between the measures of size and the actual sizes of the primary 
sampling units is only moderately high, and when, at the same time, the actual 
sizes are highly correlated with the values for the character being estimated. 
A small-scale experiment in the sampling for labor force items indicated that for 
estimating total male workers for 1930, both the variance between primary units 
and the variance within primary units for the ratio estimate (12) were approxi- 
mately one-half that for the simple estimate (7). The use of the ratio estimate 
had very little effect in the estimation of the remaining five labor force character- 
istics. The reduction in variance of the total male employment figure was 
brought about because migration since 1930 reduced the correlation between 
the 1930 and 1940 sizes, and furthermore, the number of male workers is highly 
correlated with the total population. Similar reductions for the variances of 
the other five items were not obtained because the correlations with actual sizes 
for the other items were not as high. 

6. Some final remarks. The gains just obtained arose from application of 
the sampling principles enumerated above. , The situations that these principles 
were applied to are favorable, but are frequently met in practice. The principles 
differ in their effect depending on the particular attributes of the population 
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being studied. The use of enlarged primary units may be desirable whenever 
the enlarged units are internally more heterogeneous than are the smaller units. 
The selection of primary units with probability proportionate to size is desirable 
for the general classes of populations described in Sec. VI whenever the primary 
units vary considerably in size. The use of area substratification is limited to 
sampling situations where large primary units are used. The joint effect of all 
three principles shows to greatest advantage when subsampling is used, the 
primary units are large, but variable in size, and the number of primary units 
included in the sample is limited by cost or administrative conditions. The 
types of estimates described in Sec. Ill may be effective in a large number of 
physical situations other than those mentioned in this paper. 
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APPENDIX 

1. The effect of the consolidation of the primary units on the sampling vari- 
ance (see Sec. IV-1). Let X[ -tt Xjk/qn, be the average for the sample 

i k 

where the primary units are the original units and where is the value of the 
A-th element in the j-th primary unit; q is the number of primary units in the 
sample, and n is the number of elements sampled from each of the q primary 
units. The variance of is 


(14) 



N - n 2 ^ 
(AT - l)nq 


0-g 

(Q - i)g 


2 


where Q is the number of original primary units in the population; N is the 
number of elements in each original primary unit; vL =■ SS(.3^ — Xjf/QN 
is the variance within the original primaxy units, with Xy = 2^ Xjk/N; and 

• k 

au = — Xf/Q is the variance between the original primary units, with 

X = 2Xy/(3. . 

(16) <r* = S2(Xy» - Xy/QN = (tL + . Then 

(16) <r?» = Al + pi(iV - 1)/Ar 
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the original units. 
From (15) and (16) 


is the intra-class correlation® between elements in 


(17) 


2 

<ri„ 


N - 1 
N 


<r*(l - Pi). 


Hence 
(18) 

Similarly, the variance of Xi is 


^ " '’■> + (l^s S '* + “ '>'• 


(19) 


2 

(T yf 


CN - ntr^ 
CN nq 


~ ^ w^q 


where ^2 is the mean for the enlarged primary units, p 2 is the intra-class corre- 
lation between elements in the enlarged primary units and C is the number of 
original units combined to form each enlarged unit. Then 


(20) 




<r^ /(? - me - 1) 
qNXiQ - 1)(Q - O 


+ 



..here «, - iR-jW.- ^ .„d . 

Q - 1 n {Q - C)C Cn 

Since 


- . .(.C- 1)(3 - 1 )« 3 ^ - 1 ) ^ 

(Q- l)(Q -0 


(g - 1)(C - 1) ^ . 

(Q - 1)(Q - O = ■ 


then a gain is brought about by enlarging primary units whenever pi > p 2 , 
where pi and pt are both positive. 


2. Comparison of variances of certain alternative subsampling systems where 
the primary units are of unequal sizes. The development of (4), the formula 
for the difference between the variances of sample estimates compared in Sec. 
IV-2 is given below. We shall confine ourselves to the simple case where only 
one primary sampling unit is drawn into the sample from each stratum. Let 

( 21 ) X> = TiNnXK/N 

be the sample..estimate used for each of the. three designs to be compared, where 
X'k = X'hj — ^ Xhik/n\j , and Xhsk is the value of the fc-th element in the j-th 

k 

' For definitions and properties of intra-class correlations, see Secs. 38-40 of StaiUtical 
Methods for Research Workers, R. A. Fisher, and [5]. 



356 


MORRIS H. HANSEN AND WILLIAM N. HURWITZ 


primary unit in the /i-th stratum; L is the number of strata; is the number 
of elements drawn into the sample from the j-th primary unit in the h-th stratum 

with Nhj the corresponding total number, Nh = ^ Nhj with Qh = the number qf 

L 

primary units in the h-th. stratum, and N = Nh . If the subsampling within a 

h 

stratum is of a constant proportion, C, as in the first of the subsampling systems 
mentioned, rihj in the above estimate is equal to C Nhj . If the subsampling 
within a stratum is of a constant number, as in the second subsampling system 
mentioned, as well as in the recommended system, Uhj is equal to 

Nhj/Qk ^ CNh. 

We shall denote the sample estimate for the first design by Xi , that for the 
second design by X 2 , and that for the recommended design by Xj . 

The expected values of the sample estimates for the first two designs, , 
and X 2 , are the same, and are equal to 


= E^2 = X = ^ 


V' V' 
TWhi Nhj 


^ V V' ■? . 

-AT ZmJ f\ -^kj 

k Uhj Jy h Qh 3 


where Xhj—^ Xhjk/Nhj . Thus, since ^ is not, in general, equal to 52 Nhij/'^Nhj 

Jc _ 

= Xy both X[ and X 2 are biased estimates of X. 

For the recommended design, in which the primary unit is drawn with prob- 
ability of selection proportionate to size and a constant number taken from the 
sampled units within a stratum, the expected value of the sample estimate is 

(22) 


and therefore the estimate for the recommended design is unbiased. 
The mean square error of is 


(23) 



1 ’ST' r V' ~ * L. 

H Qh Ir (Nki - 


2 


H- (:? - X)* - ^ Z NUjtn - Xh? 

where (r»y = i^hjk — ^hif/Nhi is the variance between elements within the 

k 

j-th primary sampling unit of the h-th stratum, ^h — ^ ^hf/Qk ,S.hj = Xh i/Nkj , 
axid Xh^^^Xhik/^Nkj = The first term in the square 

/ Aj i j j 

bracket of (23) is the contribution of the variance within primary units. The sec- 
ond term in the square bracket is an approximation to the mean square error be- 
tween primary units and the remaining terms give the error in this approximation . 
The mean square error of is given by the same formula but with nnj replaced 
by n» . 
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The ciifference between and is 


(24) 


2 2 . 
o-x; - ^ X 


1 V' 2 ^ hi f J:_ 

- 1 \Nh, 



which will be positive if c\j/Nhj is negatively correlated with Nhj, as is almost 
invariably the case in practice (see Sec. VI). Thus, since ordinarily is 
larger than suffice to compare show that the recom- 

mended subsampling system is more efficient than either of the first two men- 
tioned. 


The variance for the recommended design is 


(25) 


2 

<Ti' = 


h L j Nhi — 1 n* i Nh 


iX,i - Xh) 




For comparing the mean squjwe error of X'l with the variance of X 3 we shall 
define 




as the intra-class correlation coefficient between elements within the j-th primary 
unit, where <rj is the variance between all elements within the /i-th stratum. In 
this comparison, the terms outside the square brackets in (2*3), have been ig- 
nored because their contribution to the mean square error is either positive or 
negligible. Then, 


(26) 


2 



2 



_ 1 Nhi 

N^hQh\iNHi-l UhV Nh) 



The second term of this difference was given in Sec. IV-2 as the approximate 
difference, and the first term was neglected. To examine the relative magnitudes 
of the two terms we shall write 


(27) 

Nm 

- «(i 

— Shi). 


' Nh,-1 


Then 





(28) 

2 2 1 ■Vfc 2 

i ' 

'Nhi ; 

jr. \ 



For the general class of populations given in Sec. VI the covariance between 
8 hi and Nhi » also that between pm and Nhj , will be negative. Moreover, 
in many practical problems of this class the^two covariances will be of approxi- 
mately the same magnitude. In such instances the first term of (27) \^ill be 

equal to — times the second, and thus smaller than the second term for all n* > 1, 
flh 

and much smaller for moderately large values of fih . For example, in popula- 
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tions made up of clusters of different sizes for which the conditional probability 
of an element having a particular property for a fixed size of cluster is the same 
for all sizes of clusters, the two covariances will be very nearly equal. A number 
of practical problems approximate this situation. Moreover, even in the situa- 
tions where the covariance of 5*,- and Nhj is several times that of and Nhj , 
say 5 times as large, then the second term will be larger than the first for all 
> 5. 

Some numerical illustrations of the gains obtained through the use of the 
recommended system are given in Sec. VII, and for some of the items for which 
results are summarized in that section the gains were substantial. 


3. The derivation of the variance formulas (13) and (9). The mean square 
error of a ratio of random variables is generally approximated from Taylor’s 
expansion. If X’ and V' are random variables, Y' > 0, and r is the population 
character of which X' jY' = r' is an estimate, then 


(29) 



= E 


(XT')’ 




T'* \(X' _ Y 
(XT')*/ VI^' 7 ■ 


The first term in the right-hand side of (29) is a first approximation to the mean 
square error from Taylor’s expansion, and the second term is the error in this 
approximation. 

Eq. (13), and as a special case (9), is derived as follows: 


(30) 


X(r' - r)* = X { 


■ t 1 s* 1 «»</ 

£ r £ S 53 

h ' ih % i ^ 


1 

.ZrZZZr. 

\ h th % j k 


- r>. 


hiik 

^ 1 1 mhii 

Let rf'kiik = YsiMnw - r), and T' = Z i £ E Z • 

h th i 7 k 


(31) 


h th i i k \ h th % i k / 


Then, setting 

\ 

XT' \T' 7 


E(^ = XT'*(r' - rf/iEY'f 

is the first approximation to the mean square error. 

Since XT' is evaluated in the same way as XX'(8), it is merely necessary to 
evaluate XT'*(r' — r)*, the numerator of E(^. Now 


Cj-ZE Z M 

h th i i k J 


h th Kn th tq 


8h 1 ^h 

where = Z Z Z = Z • 

i i k i 
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Since -E 2 #2 ~ ^ ^ 22 22 


h i 


1 


. h i,r 
if^r 

1 


(32) EY^\r^ - r)-^ = E ^ 22 E ^ 2 : Mnr + ^ E r • 

^ ' h h i k h i,r h,q th tq 

The first term in the right-hand side of (32) is 

,f2 i ^ ^ ^5/ ‘^hij Mhif *" 

jhi tl h,i,i th Ph Mhi] Mhij 


iy^r 


('o 

hyAq 


(33) 


TTT ./2 1 ^ 1 Mhii — yn^hii ^ .2 
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The second term of (32) is 
(34) fcY - £ i E 

^ A .> A.,- ft \ < Mkii / A., ft Pa . Mia 

where 


•yr 


^Ai,- = ; 

t 


and the tiiird term of (32) is 


hf^ 


Therefore EY'^ (r' — rf = (33) + (34) + (35), and when Yhi,k{rhiit> — r) is 
substituted for ^'hUt we have 

£!"’(/ - r)' - £ i [s ^ ^ £ y. 

_i_ V PM'fnhii mkij — 1 (V V....U... _ -m^ 

"t p w w- •• \-Zm^ * hijkyX^iik wJ 
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By substituting (Vhijk — rMj + rkij — r)* for {rhijk — rf in the first term of 
(36) and PmM hif/ P m, - mhij for 1/th in the 1st, 2nd, and 4th terms, the sum of these 
three terms becomes 


E P hi Mhii P la P V® 

"n -r;*- f Mi ^ laik\Tliiik — TMi) 

h,i,i,k fh mhii Pkij 


(37) 


+ 22^ ^ f MiYhiikiXhiik — rhii)(rhii — r) 


Ph mhii Pf 


I v* p r V V® 

1 jLmJ D ^ r>2 ^ hiivkij 2Lmi ^ hiik -mr I 

iy P hii L * 


K%%i Ph P\ii 

where Phij ~ ““ '^hi^/ij^hii ““ 1) 8»n.d. Yfiijk* 

k k 

When we substitute the appropriate value for 1/th in the 3rd, 5th, and 6th 
terms of (36), the sum of these terms becomes 

12 


(38) 


Z ^’TZ ^ Yhiiirhii - r)T - £ fe ^ YhiiiTMi - r)T 

h,j L » hii J h L **y ^ ^ hij J 
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Now 

(39) 


r ^ YMiiTMi - r) = E Phi (1^' - = P»(iJw(^, - fRhiuy.r) 

i ^hii i \^hii iThij / 

= PhRhHA)-.r(fhHA) ~ ’’) 

where haA) - RhHA)/RhHA)ir , and 

Z^* ^ YMi(rhii — r) = ZPa»(®*jm) “ »'PA/U):r) = Pa(Pau) — r2?A(4).r) 
(40) <«»■ PhPhii i 

— PhRhlAlxrifhiA) ~ r) 

where fA(^) = RhiA)/Rh(A)-.r . 

Substituting (39) and (40) in (38), we have 

E (PA,/PA)Plfl«u):r(fA,(^) - - E PlRluyAfhu) “ rf 

h 

+ [2 PhRh(A):Y{fh(A) r)]*. 
h 

By substituting {huA) — fh{A\ + fkiA-) — rf for {fhUA) — rf in the first term 
in (41) and expanding, (41) becomes 

Z Ph^ RhUAitrifkiiA) — fiu))* + 2 E P*^’ PwM)!r(^yM) — ^A(ii)(fAa) — r) 

- ■ Ph h.i Ph 


(42) 


ki 


+ ZPA(i'AfA) — »‘)*^E^^^A/U):r — PA(A):rJ + [E PhRh(Ahr(fk{A) — »•)]* 
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Hence, since {EY'f — (37) + (42), 

P Stf hijk^Xhijk 1"*!;) 

^kij h 


Ph Mhii — 1 


W-At; Mhij P h 


^ ^ , P*,- M*.v - m*.7 ? ^**'*('’*”* »■) 

*• P* M*,, - 1 m«,M«ynv 

I V' E>2 ““ 2 * (>•**; — rY 

" I ^hi -mr 1 ^htjlY ~2 

fc.t.j P* Mfciy — 1 rrihijTlii 

+ 2 PhiPhj/Ph)Rhj(A):Y(Thi(A) — 

h,3 

+ 2 ^ Ph(Phi/Ph)Rh3iA):Y(hi(A) fh(A))(fh(A) — 0 

h.i 

+ 23 Ph(h(A) — rY{Phj/Ph){RhiiAy.Y — Rkay.yY 

Ki 

+ [23 PhRhU)lY{fh(A) “ 


7 wi j 

Wh6r6 (T}^ijlY ” 23 Yhij) Yhij “ 23 Yhijk/^hij “ YjiijJl^hij* 

k k 

The approximation to /i^r' — r)^ is given, by (43) divided by {EY'f. By ig- 
noring the 2nd, 5th, and 7th terms which are negligible for a large class of popu- 
lations, we obtain (13). 

The variance of X' is derived from (43) by simply substituting Phij/P for 
Yhijk in (43). This follows from the considerations given below: 

Since r' = X'jY ' , and X' is the numerator of r', is given by o^^ when the 
denominator, Y ' , is identically equal to unity in repeated samplings. 

O* ^ ^^hii P hi Phi p /c\ 

Smce - = ^ from (6), 

th 'f^hiiPhij ^hijPhij 

the denominator of r' which is equal to 

±t±'t‘=^ Yhijk , Will be identically equal to unity in repeated sampling 

h i j k ftlMj^hij 

when Yhijk is set equal to Phij/P where P = 2 Pa . 

The formula for the mean square error of X' (9), of course is exact since the 
error term 

E\Y^/(jEY'f)[r’ - r}* = 0. 

It may be pointed out that az> may be obtained directly and more simply 
without "the use of (29) since X' is not estimated from the ratio of random 
variables. 

From (29), the error term for the approximation to Ei/ — r)\ (43)/(EY')\ is 
given by ~ cannot be expressed as a simple func- 
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tion of the individual observations, but useful maxima and minima for it may be 
obtained. A method for obtaining the upper and lower bounds of the variance 
of r' is simply attained from the following inequalities which hold independent 
of the joint distribution of X' and Y'. 

EY'* EY'^ 

(44) - rf < Eif - rf < -^ (y - rf 

* max A min 


where Fmax is the maximum value of the Y' obtained simply by choosing or 
estimating the largest Y^ for each stratum. Fmin (the minimum value of Y') 
is obtained in a similar manner. 

Eq. 44 when evaluated turns out to be 


(45) 


{EY^m 

y* 

max 


< Ey - rf < 


{EY^EB^ 

Yliu 


where {EY'fEff is given by (43). 

Eq. (45) will serve adequately as an indicator of the accuracy of E^ for sam- 
pling systems in which the variability of the F’s within strata is restricted . How- 
ever, in other designs, where stratification is not used and the variability in the 
y’s is not restricted the limits given by (45) may be too broad to be useful. 


REFERENCES 

[IJ A. L. Bowlby, ‘‘Measurement of the precision attained in sampling/^ Bulletin de Uln- 
stitut International de Statiatique Tome XXII, (1926), 1 ere Livraison. 

J2] W. G. Cochran, “The use of analysis of variance in enumeration by sampling, “ Jour, 
Amer, Stat. Assoc., Vol. M (1939), pp. 492-510. 

[3] W. G. Cochran, “Sampling theory when the sampling units are of unequal sizes, “ 

Jour, Amer. Stat, Assoc, ^ Vol. 37 (1942), pp. 199-212. 

[4] Calvert L. Dedrick and Morris H. Hansen, Final Report on Total and Partial Unem- 

ployment^ Vol, IV. The Enumerative Check Census, U. S. Govt. Printing Office 
(1938). 

[5] Morris H. Hansen and William N. Hurwitz, “Relative efficiencies of various sam- 

pling units in population inquiries,^ Jour. Amer. Stat. Assoc., Vol. 37 (1942). 
pp. 89-94. 

[6] R. J. Jessbn, Statistical investigation of a sample survey for obtaining farm facts. Re- 

search Bulletin 304 (1942), Iowa State College, Ames, Iowa. 

[7] P. C. Mahalanobis, “A sample survey of the acreage under jute in Bengal,” Sankhyd, 

Vol. 4 (1940), pp. 511-530. 

[8] J. Neyman, “On the two different aspects of the representative method; a method of 

stratified sampling and the method of purposive selection,” Jour. Royal Stat. 
Soc., New Series, Vol. 97 (1934), pp. 558-606. 

[91 J. Neyman, “Contribution to the theory of sampling human populations,” Jour. Amer. 
Stat. Assoc., Vol. 35 (1938), pp. 10W16. 

[10] F. Yates and I. Zacopanay, “The estimation of the efficiency of sampling, with special 

reference to sampling for yield in cereal experiments.” Jour. Agr. Sci., Vol. 25 
(1935), pp. 543-577. 

[11] Bureau of the Census, “A revised sample for current surveys,” February 1943. 



MULTIPLE SAMPLING WITH CONSTANT PROBABILITY 

By Walter Bartky 
The University of Chicago 

1. Introduction. In an attempt to reduce inspection costs, manufacturers 
have frequently resorted to sampling procedure in which the disposition of an 
aggregate or lot of similar units does not necessarily depend upon the results 
of a single sample. In practice, however, the number of permissible additional 
samples is limited to one or two; nevertheless, if the lot is very large, an appre- 
ciable reduction in the expected sample may be accomplished by allowing a 
greater number of additional samples. In this article probability formulae will 
be derived for an inspection procedure for infinite lots in which the number of 
additional samples is not limited and may be any number depending upon the 
results of the sampling. This development will be limited to the simple case of 
attribute inspection in which the units fall into two categories — satisfactory 
units or defective units. If p denotes the fraction defective in an infinite lot, 
then the probability of finding exactly m defective units or defects in a sample 
of n is 

(1) P(m,n) q = l-p. 

Since P{m, n) is the probability of m successes in n trials with constant probability 
of success p, though the terminology of commercial inspection will be used in 
this article, the results are applicable to other situations involving repeated trials 
with constant probability of success. 

In contrast with multiple sampling, a single sample inspection procedure for 
lots of the type here considered is one in which a lot of units is accepted or re- 
jected on the basis of the number of defective units found in the sample. Thus 
a lot is accepted if the number of defects is at most an integer c the “acceptance 
number,” and rejected if the number exceeds c. For an infinite lot containing a 
fraction p of defects and a sample of n units, the probability of accepting is by (1) 

(2) n. (c, n) = 53 Pirn, n), 

and the probability for rejection is the difference between this sum and unity. 

2. Multiple sampling. The procedure in multiple sampling is to examine 
first an initial sample of no units. If the number of defects in this initial sample 
is at most e the lot is accepted and if the number of defects exceeds c -f i (A; an 
integer) the lot is rejected. But if the number of defects is greater than c and 
less than c -H A; -f 1 an additional sample is removed and examined. In the 
latter case similar criteria determine whether the lot is to be accepted or rejected 
or this method of sampling continued. With an infinite lot this scheme of samp- 
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ling has an infinite variety of forms but there are certain advantages in limiting 
this discussion to the following type of multiple sampling procedure. 

I. Sample Sizes: The initial sample is of no units but all additional samples 
are of the same size, namely n units. 

II. Condition for Acceptance: The lot is accepted if the number of defects in 
initial sample of no units is at most c or if after taking r additional samples of n 
the total number of defects in the no + m units examined equals c + r. 

III. Condition for Rejection: The lot is rejected if the number of defects in 
initial sample of no units exceeds c + fc or if after taking r additional samples of 
n the total numl>er of defects exceeds c + r + k, 

IV. Condition for an Additional Sample: An additional sample of n is taken 
only if neither condition II nor condition III is realized. 

Thus in this sampling scheme the level for acceptance as well as the level for 
rejection increases by unity for each additional sample of n. If at the r-th addi- 
tional sample a lot is neither accepted nor rejected then the total number of 
defects in initial plus additional samples must equal one of the k numbers 

c + r-f 1, c-fr4"2, •••,c + r+/b. 

Denote the probabilities for obtaining these numbers by 

(3) Pi(r),F2(r), 

respectively, the subscript indicating the number of defects in excess of the ac- 
ceptance level. 

To be accepted on the (r + l)-s^ additional sample, (a) no defect must be 
found in the (r + l)-8t additional sample and (b) a total of c + r + 1 defects 
must be found in previous samples. The probability of (a) is given by (1), 
taking m equal to zero, and the probability of (b) is the first one in the set (3). 
Consequently the probability of accepting a lot on the (r + l)-st additional 
sample is 

Po(r + 1) = q^Piir). 

If n denotes the probability of eventually accepting the lot 

(4) n = Z P(m, no) + q”lPi( 0 ) + Pi(l) + />i( 2 ) +•••], 

m^c 

where the first term on the right is the probability of accepting on the initial 
sample and may be evaluated by means of (1). Furthermore 

(5) Pi(0) = Pic + i, no) 

and is by (1) the probability of finding c + f defects in initial sample. 

According to the notation (3) the probability of finding a total of c + r + 1 
+ i defects in initial plus r + 1 additional samples, that is i more defects than 
the acceptance level, is P,(r + 1). These probabilities may be expressed as 
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linear combinations of the probabilities (3) with coefficients that are probabili- 
ties of the type (1). Thus 

(6) P.(r -1- 1) = Z P(i -j + h n)P,(r) 

i 

where the sum may be made to extend for j = 1,2, • • • , fc, provided one defines 
(1) as equal to zero for negative m. By repeated application of this linear trans- 
formation it is possible to express the probabilities (3) for additional samples in 
terms of the probabilities (5) for the initial sample. Thus if M denotes the kXk 
square matrix with elements 

(7) Mij = P{i - j + 1, n) (i, j = 1, . . . , fc), 

by omitting subscripts and regarding P(r) as a vector with elements given by 
(3), the linear transformation may be written 

(8) P(r + 1) = MP(r). 

Hence by repeated application of (8) 

(9) P(r) =MT(0) (r = 0,1,2,...) 

provided the zero power of the matrix M is defined as the identity matrix 7. 

The probability P<(r) cannot exceed the probability of finding exactly c + 
r + i defects in a single sample of 71q + m units, that is, in the notation of (1), 
the probability P(c + ^ + rn). Since the latter probabilities approach 

zero as r approaches infinity it follows that the limit of the elements of P(r) as r 
approaches infinity is zero. Thus with this multiple sampling procedure a lot 
is eventually either accepted or rejected. Furthermore since the matrix M con- 
tains no negative elements and P(0) may be chosen with all positive elements 
it follows that the elements of M' approach zero as r approaches infinity or 

(10) lim ilf** = the zero matrix. 

r— »op 

It can be demonstrated that since the limit (10) is the zero matrix the sum of 
the infinite geometrical series in the matrix M 

(11) I + M + + •■■ = {I - Mr\ 

where the right member is the reciprocal of the matrix I — M. Consequently 
the infinite sum of vectors 

(12) F = E P(r) = (/ - M)“‘P(0). 

r-O 

This infinite sum of vectors has elements Vi, Fj , • • • , F* of which the first 
element is the sum in brackets occurring in the right member of (4). Hence the 
probability of eventually accepting the lot 

(13) n= Emno) + g"Fx, 

m^o 
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and is thus by (12) and (5) expressible in terms of probabilities for the initial 
sample, equations (1), and the reciprocal of the matrix I — M. 

In addition to the probability for acceptance one is also interested in the 
expected number, E, of additional samples. Since 

ZPi(r-l) (r= 1,2,3, •••), 

% 

where the sum extends over all i = 1 , 2, • • • , A: is the probability of continuing 
to the r-th sample, it follows that 

E A(‘- - 1) - E Piir) 

i % 

is the probability that lot will be either accepted or rejected on the r-th sample. 
Therefore the expected number of additional samples 

E = i: r[E Piir - 1 ) - E P<(r)] 

r>0 i i 

= E E Piir), 

r^O i 

or, on interchanging the order of summation and applying (12), 

(14) E='ZVi. 

i 

That is, the expected number of additional samples equals the sum of the ele- 
ments of the vector F. 

Though it is possible to develop a general expression for the reciprocal matrix 
I — M, to determine the acceptance probability, 11, as well as the expected num- 
ber of additional samples it is onlj" necessary to evaluate V. Now by (12) this 
vector is the solution of the linear system of equations 

(16) (/ - M)V « P(0). 

Though for k small this system could be solved dii-ectly, in order to find a form 
of the solution applicable for any value of k, let the expansion in power series in 
X oi 

(16) [iVX + g)" - X]-^ = g, + g,x + g^^+ , 

where the coefficients, g, are functions of p and g. On clearing of fractions and 
equating coefficients of like powers of x it is found that 

(17) gi = g-" 

and, by equating the coefficients of the first k powers of x and using the nota- 
tion (7), 

g. - E MuQi = 


( 18 ) 


a = 1 , 2 , - 1 ), 

it - k). 
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Similarly, if the expansion in power series of 

Ep<(oy 

(^ 9 ) . — = hi + htx + htx* + • • •, 

(px + g)* — X 

where the sum is for alH = 1, • • • , &, then by clearing of fractions and equating 
coefficients of like powers of x it is found that 

(20) hi = 0, 

and 


( 21 ) 


h% ilfij hj — 

j-i.-.i 


j-Pi(0) 
\-P*(0) + hk+i 


It follows from equations (18) and (21) that if 

( 22 ) Vi = g^k+i/Qk+x - hi 


(t = 1, •••, fc - 1), 

a = k). 
(f = 1, • • • , k), 


then V, the vector with these elements, will satisfy equation (15). Since by (17) 
and (20) 

(23) Vi - q’^h^i/gk+i , 


the probability for eventually accepting the lot is by (13) expressible as 

(24) n = X] ^) + hk^i/gk^i , 

mge 

while the expected number of additional samples is the sum of elements (22) 
of Vi. 

These results Avill now be summarized and simplified formulae derived for 
special cases. In the summary all probabilities are expressed by means of (5) 
in terms of the probabilities (1). 


3. Summary of multiple sampling formulas. For this multiple sampling 
procedure the initial sample is no and the additional samples are n. A lot is 
accepted if on the r-th additional sample the total number of defects found is at 
most c + r and rejected if the total exceeds c + r. An infinite lot containing a 
fraction p of defects is either accepted or rejected, the probability of acceptance 
being given by 

(25) H = hi+i/gk+i (g = 1 — p), 

and the probability of rejection is 1 — 11. The expected number of additional 
samples is 


ff*+i 




(26) 
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where the sum extends over i = \,2, • ■ • ,k. The gi and hi are the coefficients 
in power series of x in the expansions of: 


(27) 

(28) 


- J. + ».x + S.X’ + 

'^^\vx + qY-x + + •••, 


where the sum is for all i = 1, 2, • • • , /?. These formulae apply to all finite 
values of c and k provided the binomial coefficient is zero for values of the argu- 
ment falling outside those occurring in the ordinary expansion of an integral 
power of a binomial. 


4. Computation of coefficients g and h* If the denominator in (27) is first 
expanded in power series in 


and then the resulting negative powers of binomials expanded in power series in 
X, it is found that 




q 




(29) 






m+i (ik — m)n + m - 


m 


0 


By (28) the coefficients h are expressible in terms of the ^’s, 

hi = 0 , 

(30) 


t-1, •••,*-! \c "h V 


fc 7*^ 1. 


Other expressions for the coefficients may be derived from the theory of func- 
tions of a complex variable. Thus by Cauchy’s Integral Formula 


(31) 


2xV-l/c 

“ 2jrV^ I 


dx 


c x*+‘[(px + qY — x] ’ 

5(x) dx 

c x*+*[(px + qY — x] ’ 
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where 

(32) «(*) - .y 

and the closed path of integration C in the complex plane only includes the pole 
at the origin. Since the integrands are rational functions and the point at in- 
finity is not a singularity for either integrand, these integrals taken about the 
origin are equal to the negative sum of the corresponding intc^gials taken about 
the zeros of 


(px + qV - X. 

lip tT^ it can be demonstrated that there are n distinct zeros Xi^ X 2 , • • • , Xn 
corresponding to the solutions of the algebraic equation 

(33) (px, + qy =: Xs (s = 1, . . . , n). 

One solution is obviously 

(34) = 1, 

and for p = this solution is a double root. ^ 

The integrals about these zeros are obtainable from Cauchy ^s Integral Formula 
and after integrating and simplifying the resulting sum by means of (33) it is 
found that for the case p n“\ 

1 , y- yx.’Vq 

I - np^ xV'\ - (n - VipXmV 

(35) 

, _ _S(1) , (pj. + q)S{x, ) 

1 — np — (n — l)px,V 

If the power series (27) is multiplied by the series 

(1 — x)“* = l+ x + a:* + x*+---, 

the resulting product 

(l-*)[(pi+'j)- -»l “ + (s. + ». + s.)*’ + ••• . 

so that, by Cauchy’s Integral Formula, 

^ 1 /• dx 

(36) Ok = ^ Ui = 2 ^V— 1 - x)[(px + ?)" - *] • 

Similarly the sum of the coefficients h that occur in the right member of (26) may 
be written 

(37) Hk = ^ hi = 2irV^ L x*(l - x)l(pl + qT - x] ' 
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The integrals (36) and (37) are of the same type as (31), and by employing the 
same method of integrating used in deriving (35), the following expressions for 
the sums of coefficients g and h occurring in (26) are obtained : 


Ot = S = T 

i 1 

(38) 

t 


k n(n — l)p* , px, + q 

- »p 2(1 - np)*'^.-j^..«*J(l - »«)[? - (» - l)p*.]’ 

foS(l) - S'{1) _ n(n - l)p*g(l) 

1 — np 2(1 — np)* 


+ 



(px, + g)<S(g») 

, xl(l - x,)[g - (n - l)p*,] 


provided np 1 . Here <S'(1) is the derivative of (32) with respect to x evaluated 

for X = 1. For the special case np = 1, two of the roots of (33) 

Xi = X* = 1, 

and the integrals (36), (37) and (38) become respectively 
(n - 1)?*+! = 2fcn + In - 4 + Z • 

(n - l)/it+, = (2*n + fn - 4)S(1) - 2n5'(l) + Z 

(39) (n — 1) Z 9i ~ 1^ H" tV* “ ~ tV ■” I™ * + Z jETJ > 

(n - )1 Z ^ = (**» + Iftn + *» - I* - A - 4n"‘)S(l) 

f 

”• (in “ I + 2An)5'(l) + 7 i 5"(1) + Z jkTj ^(®*)> 

where the sum extends over all roots of (33) that are not equal to unity. Here 
/S'(l) and 5"(1) are the first and second derivatives of (32) with respect to x 
for X = 1 and p = n“*. 

Formulas (35), (38) and (39) require for their evaluation the solutions of equa- 
tion (33). For n greater than unity there are just two positive real solutions, 
say Xi s 1 and xt. If n is even all other roots are complex numbers, while if n 
is odd they are complex with the exception of one negative real root. Conse- 
quently by (33) for s =• 3, 4, • • • „ n the absolute values of the roots satisfy the 
inequality 

(P I I + ?)* > ». . 

and consequently tiie | x, | cannot be between xi and xt . But equation (33) may 
be written 


(pg» -I- g)* - 1 ^ 1 
+ 9) “ 1 P 


(• ^ 1 ) 
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SO that for 5 = 2, 3, • • • , n 

(40) 2^ (pa;. + g)' = 1/p 

t 

where the sum is taken for i = 0, 1 , • • • , n — 1 and therefore 

D (p 1 a;. 1 + gy >l/p (s = 3, 4, • • • , n). 

t 

Now X 2 is the only real and positive solution of (40), consociucntly, in order to 
satisfy the inequality, the absolute values of roots corresponding to s = 3, 
4, • • • , n must exceed :r 2 . On combining this result with the former, it follows 
that 

(41) I X, I > 1 and X 2 . 

Consequently for large values of k the most important terms in the right members 
of (35), (38) and (39) correspond to the real positive roots = 1 and Xo of equa- 
tion (33). By omitting the terms corresponding to s = 3, • • • , ?i one can derive 
approximations to the g and h and their sums applicable for large k values. In 
fact for 7ip near unity the roots corresponding to s = 3, 4, • • • are consideral^ly 
greater than unity as is illustrated in the following table of roots for the case 
np = 1 : 

n = 2, p = 1/2; = 1, 1; 

n = 3, p = 1/3; x, = 1, 1, —8; 

n = 4, p = 1/4; x, = 1, 1, -7 dz 4 V^; 

n = 5, p = 1/5; Xg = 1, 1, —12.2531 • • • , 

-4.8734 . • • dz 7.7343 • • • V~l 

and for s = 3, 4, 5, • • • , I x^ | is greater than 8. 

For very large values of n and small values of p one can find approximate 
values for the roots by solving the limit equation obtained from (33) by putting 

a = np 

and letting n approach infinity. This equation is 

(42) = 

where e is the base of the naturaJ logarithms. For the case a the roots are 
1, 1, 3.0891 • • • ± 7.4602 • • • V^, 3.66 • • • ± 13.88 • • • and 

X, = (6 - V^) + approximatdy, 

where 

b ^ {2u+ 1/2)t, w = 4, 6, 6, • • • . 



372 


WALTER BARTKY 


From equation (39) and these numerical results it follows that even with k as 
small as 3 the percentage error for the case np = 1 introduceJ in gt by omitting 
the terms in the indicated sum is less than .002%. Consequently for all practi- 
cal purposes one may omit the complex and negative roots for values of k greater 
than 3 in computing the g's for np in the neighborhood of unity. For smaller 
values of k the exact values of the g’a are readily obtainable from (29). 

6. Special cases. Consider first the case in Avhich c < 0 and no ^ k + c. 
With these conditions, under no circumstances could a lot be accepted or rejected 
on the initial sample and the indicated sum in the right member of (25) is zero. 
Furthermore for this case the sum (32) becomes 

(43) S(x) = (px + g)”"®"*. 

Consequently it follows from (33) that 

(44) S(x,) = xt~% 
where 

(45) t — no/n. 

It should be noted however, that for t not an integer the right member of (44) is 
multiple valued and one must take that value for which 

(46) xl » (px, + g)”*. 

Thus for real positive values of x, , the right member of (44) is real. For integral 
values of i there is of course no ambiguity in the notation. 

If (44) is substituted in the second equation of (35), the resulting expression 
for the h coefficient is of the same form as that for the g coefficient, in fact 

so that by (25) the probability for acceptance is for this case 

(47) n = gk-t+c+i/gk+i • 

In similar manner it follows from (43) and (46) that the sum of the h coeffi- 
cients, equation (38), 

Hk Ok-t+c + t 

and hence by (26) the expected number of additional samples 

(48) E ^nOk- Gk-o^c - t. 

Since the initial sample is nt units and the additional samples are all equal to 
n units, the expected total number of units, sampled, that is, initial plus addi- 
tional samples is 

(49) 


I = no + nE = niUOk - Ok-t+c). 
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Since for this case it is impossible to accept or reject on the initial sample one 
could combine the initial sample with the first additional sample. In fact one 
can continue combining initial and additional samples and thus increasing c and t 
provided the new initial sample no and the new c value thus obtained are such 
that 

(50) c ^ 0, no = n^ g A; + n — 1 + c. 

In this process of combining samples i and c increase at the same rate and conse- 
quently formula (47), and the right member of (49) are unchanged. In other 
words formulas (47) and (49) may also be used under conditions (50). 

It was demonstrated in Section 3 that for k sufficiently large one can omit 
those terms in (35) and (38) corresponding to complex or negative roots of (33). 
If this is done the following useful approximations for the g and G are obtained : 

g* = (1 - np)-‘ + [g - (n - \)pxV 

(51) Gh = 4(1 — np) * — in(ra — l)p*(l ~ wp) ^ 

+ [g - (n - l)pa:r*(l - xY^ 
provided np 5 ^ 1 , 4 1 and x is the real positive root of 

(52) (pz qY = X (np 1) 

that is not equal to unity. Fornp = 1 these approximations become by (39) 

(n — l)gt = 24n + 2n/3 — 4/3 

(53) 

(n - 1)G* = 4’n + 54ra/3 + n/18 - 44/3 - 1/18 - n"79, ^ ^ 1. 

These formulae in conjunction with formulae (47) and (49) give quite satis- 
factory approximations for the probability for acceptance II and the expected 
total number of units sampled even when values of the subscripts employed are 
as small as 3. Of course the larger the value of 4 in (51), (52) or (53) the better 
these approximations. 

Now the root x of (52) is greater or less than unity depending on whether the 
product a — np is less than or greater than unity. Consequently it follows from 
(47) and (51) that for c = 0 and t finite 

n' = lim n = lim gt-t+i/gin-i 

(54) =1, np < 1; 

= z\ np > 1; 

while by (49) and (51) the expected total number of units sampled has the 
limiting value 


J' - lim 7 = 
*-♦00 


(65) 


(ntO- — np) np < 1; 
(00 , np> 1. 
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But k infinite implies that under no circumstance can a lot be rejected. Conse- 
quently n' and /' are the exact values of the probability for acceptance and the 
expected total sample respectively for the following sampling procedure: 

The initial sample is no = nt and all additional samples are n. The lot is 
accepted if on the initial sample no defects are found or if after taking r addi- 
tional samples a total of exactly r defects is found. 

In inspection problems p is usually small and n large so that the approxima- 
tion (40) may be used to determine the real positive root x, thus 

(56) = X (a = np). 


and (56). Suppose that the 
Then the initial sample no , 
parameter x. Consider next 
the problem of sampling a number of lots that fall into two categories, namely 
those containing a fraction p of defects and those containing a fraction p* of 
defects where p* < p. If in addition the sampling procedure is to be such that 
lots with fraction p* of defects are eventually accepted, but lots with fraction p 
of defects have a small assigned probability of acceptance n', then whatever 
the value of x as long as the resulting np ^ 1 these conditions are satisfied. 
Furthermore if one insists that the expected total sample for lots containing a 
fraction p*, namely by (55) 

7'(p*) = no(l - np*)“‘, 
be a minimum, then it is found that 
(58) X = pVp- 

This remarkably simple result is capable of still greater generalization. By an 
altogether different approach to the problem the author has succeeded in proving 
that of all possible multiple sampling procedures, the multiple sampling method 
here described and defined by equations (57) and (58) gives the minimum 
expected inspection for the problfem under consideration provided n is sufficiently 
large.^ 

By letting both —c and k approach infinity it is possible to derive probability 
formulae for sampling procedure in which a lot is either rejected or the sampling 
continues without end. These formulae are included in Table I along with 
other special cases derived from previously listed general formulae. 

^ Note: The author has postponed publication of this proof in the hope that it might 
be generalized to include sampling problems involving both acceptance and rejection of 
a lot. 


It then follows from (54) and (55) that for np > 1 

-- log n' 

(57) ^ 


1 — rr 


= nop, 


- log X _ 


1 — X 


= np. 


These relations are of course equivalent to (54) 
probability 11' and the fraction p are assigned, 
and additional sample n, will depend on only the 
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TABLE I 


Notation: 

n = number of unitvS in each additional sample 
no = number of units in initial sample 
p = fraction defective in lot 
a ^ np 
g = 1 - p 

c = maximum number of defects in initial sample for acceptance 
t = uqIu = ratio initial sample to additional samples 
f=c + k + i== minimum number of defects in initial sample for 
rejection 

c + r = number of defects in initial plus first r additional samples for 
acceptance 

^4-r = c + A:+l + r = minimum number of defects in initial plus 
first r additional samples for rejection 
^ n = probabilit}’’ of eventually accepting lot with fraction p defects 
1 — n = probability of eventually rejecting lot with fraction p defects 
I = expected total number of units sampled (i.e., initial plus what- 
ever additional samples are sampled). 

X = real positive root different from unity of the equation 
(px + g)” = X, 


Conditions 

n 

/ 

k = 1 
(a) c =0 
f = 2 

n. ^ 1 - (n -no) pgr”"* 

r,« X ^ 

^ ' 1 — npq^'~^ 

/N - „ , 

1 — npq^ ^ 

^ O SS 

ii ii ii ii 

g"(l — 

71 (1 — npq^'^^y^ 

, . c = —k 
<»> / . 1 


no + ng’*® ""Gk/Ok+i 

k = 1 
(d) c = -1 
/ = 1 

i 

- npq’'~Y^ 

no + ng""(l — »pg"~‘)~^ 

k = 2 
(e) e = -2 
/ = 1 

^no-f2» 

Wo + 

ng"^! + g" - np«"-‘) 

1 o I„»(n + l)^2„lin-s 

1 - 2npq + 2 P 9 

1 - 2npg»-i + pV"~* 


TABLE I — Concluded 


Conditions 

n 

/ 

k = — c 

(f) = CC 

/ = 1 

0 for np > 1 

— np) for np < 1* 

no + nx{\ — x)~^ for np > 1 

00 for np < 1 

c = 0 
, . n = 2 

(8^ « - f 

no ^ j 

= jfc + 1 

1 

no(2n - 1) 

1 + (p/g)"" 

Q-P 

c = 0 
n = 2 
(h) no = / 

= A: + 1 
P = 1/2 

0.5 

nl 

c = 0 

no = n 

gk/gk+i 

n(IlGk — Gk-i) 

II II 

8 ® 

1 (np < 1) 

(np > 1) 

Mo(l — np)”^ (np < 1) 

00 (np > 1) 


* In this sampling procedure a lot cannot be accepted so that n is the probability 
that additional samples will be taken without end. The probability of rejecting lot 
is however 1 — H, 


TABLE II 


Values of g and G for Limit n = ^ , p = 0 


np “ 

a — 

0.2558 

0.4024 

0.6931 


1.3863 

2.0118 

2.5584 

X ■■ 


5 

2 

1 

.5 j 

.2 

.1 

gi 

1.292 

1.495 


2-718 


7.477 

12.915 

92 

1.338 

1.634 

2.614 

4.671 



133.76 

92 

1.3432 

1.665 

2.935 

6.667 

23.48 


1343.2 

g* 

1.3437 

1.6717 


8.667 

49.55 


13.4 X 10* 

9h 

1.3438 

1.6729 

3.178 

i 



5228. 

134 X 10* 

9«i 

1.3438 

1.6732 

3.2589 

00 

00 

00 

00 

Gx 

1.292 

1.495 

2.000 

2.718 

4.000 

7.477 

12.915 

Ga 

2.629 

3.130 

4.614 

7.389 

14.45 

48.34 

146.7 

G, 

3.972 

4.795 

7.549 

14.05 

37.93 

256.5 

1490 

(?4 

5.316 

6.467 

10.65 

22.72 

87.5 

1301. 

14.9 X 10* 

Ga 


8.140 

13.82 

33.39 

189.2 

6529. 

149 X 10* 
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As an illustration of the method of application of these formulae, suppose that 
the sampling procedure is to be such that the probability, II, of accepting a 
“p” value of 0.6 + € equals the probability of rejecting a “p” value of 0.5 — «. 
This condition on probabilities is by Table I, formula {g), always satisfied if 
c = 0, n = 2, and no = k + 1. This corresponds to a multiple sampling scheme 
in which additional samples are only two units each and a lot is accepted or 
rejected on initial sample if none or all units are defective. With « = 0.1 and 
n ^ 1/6, one can take no = 4 and k = Z. The expected total number of units 
examined depends on “p” and varies for this numerical case from 4, for p = 0 
or 1, to a maximum of 16, for p = 0.5. Nevertheless a single sample plan 
satisfying the same conditions would require a sample of 23 units whatever the 
value of p. 

The previous problem is, however, not typical of those encountered in com- 
mercial inspection for in such situations p is usually very small. In practice 
one can generally replace the formulae in Table I by their limiting values for 
n = CO , p = 0, and np = a. Table II gives the limiting values of the g and G 
as well as a: for a small number of values of o. 

Finally the justification for multiple sampling lies in the fact that a reduction 
in the expected total sample is possible. Though this paper is limited to the con- 
sideration of a very elementary type of sampling, it indicates that it might be 
worth while to investigate the possibility of utilizing the methods of multiple 
sampling in inspection for variables. Unfortunately serious mathematical 
difficulties are even encountered in so simple a problem as multiple sampling 
from a normal population for the mean. 



AN EXACT TEST FOR RANDOMNESS IN THE NON-PARAMETRIC CASE 
BASED ON SERIAL CORRELATION^ 

By a, Wald and J. Wolfowitz 
Columbia University 

1. Introduction. A sequence of variates xi , • • • , sjat is said to be a random 
series, or to satisfy the condition of randomness, if a^i , • • • ,Xn are independently 
distributed with the same distribution; i.e., if the joint cumulative distribution 
function (c.di.) of a:i , • • • , is given by the product F{x^) • • • F{xn) where 
F{x) may be any c.d.f. 

The problem of testing randomness arises fretiuently in quality control of 
manufactured products. Suppose that x in some (luality character of a product 
and that Xi , xj , • • • , Xy are the values of x for N consecutive units of the 
product arranged in some order (usually in the order they were produced). The 
production process is said to be in a state of statistical control if the sequence 
(xi , • ■ • , Xu) satisfies the condition of randomness. A number of te.sts of ran- 
domness have been devised for purposes of quality control, all having the fol- 
lowing features in common: 1) They are based on runs in the secpience Xi , • • • , 
Xy . 2) The test procedure is invariant under topologic transformation of the 
x-axis, i.e., the test procedure leads to the same result if the original variates 
xi , • • • , xy are replaced by xj , • • • , Xy where xl = /(Xa) and /(<) is any con- 
tinuous and strictly monotonic function of t. 3) The size of the critical region, 
i.e., the probability of rejecting the hypothesis of randomness when it is true, 
does not depend on the common c.d.f. F{x) of the variates Xi , • • • , Xy . Con- 
dition (3) is a fortiori fulfilled if condition (2) is satisfied and if F(x) is continuous. 
The fulfillment of condition (3) is very desirable, since in many practical appli- 
cations the form of the c.d.f. F{x) is unknown. 

Tests of randomness are of importance also in the analysis of time series (par- 
ticularly of economic time series) where they are frequently based on the so- 
called serial correlation. The serial correlation coefficient with lag h is defined 
by the expression* (see, for instance, Anderson [1]) 



where Xh+a is to be replaced by Xfc+<,_y for all values of a for which h a > N. 
The distribution of Rh has recently been studied by R. L. Anderson [1], T. 
Koopmans [2], L. C. Young [3], J. v. Neumann [4, 5], B. I. Hart and J. v, Neu- 

* Presented to the Institute of Mathematical Statistics and the American Mathematical 
Society at a joint meeting at New Brunswick, New Jersey, on September 13, 1943. 

* Some authors (see, for instance, [2] p. 27, equation (61)) use a non-circular definition 
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mann [6], and J. D. Williams [7], under the assumption that xi, • • • , Xff are 
independently distributed with the same normal distribution. Thus, in addition 
to the randomness of the series (xi , • * • , it is assumed that the common 
c.d.f. of the variates , • • • , Xat is normal. This is a restrictive assumption 
since frequently the form of the common c.d.f. F{x) of the variates , • • • , 
is unknown. 

The purpose of this paper is to develop a test procedure based on Rh such that 

(a) if F(x) is continuous the size of the critical region does not depend on the 
conunon c.d.f. F{x) of the variates Xi , • • • , xn , thus making an exact test of 
significance possible also when nothing is known about F(x) except its continuity; 

(b) if F(x) is not continuous, but all its moments are finite and its variance is 
positive, the size of the critical region approaches, as AT — > qo , the value it would 
have if F{x) were continuous. Thus in the limit an exact test is possible in this 
case as well. We will refer to the case where the form of F(x) is unknown as the 
non-parametric case, in contrast to the case when it is known that F(x) is a 
member of a finite parameter family of c.d.f.’s. 

The test based on the serial correlation seems to be suitable if the alternative 
to randomness is the existence of a trend^ or of some regular cyclical movement in 
the data. In the analysis of time series it is frequently assumed that this is the 
case and this is perhaps the reason why tests based on serial correlation are 
widely used in the analysis of time series. In quality control of manufactured 
products the existence of a trend is often considered as the alternative to random- 
ness, caused perhaps by the steady deterioration of a machine in the production 
process. Thus, tests of randomness based on serial correlation could also be 
used in quality control. 

2. An exact test procedure based on Rh • Let a* be the observed value of 
Xaia = 1, • • • , AT). Consider the subpopulation where the set (a;i , • • • , Xs) is 
restricted to permutations of ai , • • • , . In this subpopulation the proba- 

bility that (xi , • • • , Xfi) is any particular permutation (a( , • • • , a^) of (ai , • • • , 
Cff) is equal to 1/A^! if the hypothesis to be tested, i.e., that of randomness, is 
true. (If two of the a< (i = 1, 2, • • • , iV) are identical we assume that some dis- 
tinguishing index is attached to each so that they can then be regarded as distinct 
and so that there still are A^! permutations of the elements ai , * • • , Oy^r.) 

The probability distribution of Rh in this subpopulation can be determined as 
follows: Consider the set of Nl values of Rh which are obtained by substituting 
for (xi , • • • , xn) all possible permutations of (ai , • • • , ay). (A value which 
occurs more than once is counted as many times as it occurs.) Each of these 
values of Rh has the probability 1/N\. On the basis of this distribution of Rh 
an exact test of significance can be carried out. Suppose that a is the level of 
significance, i.e., the size of the critical region. We choose as critical region a 
subset of M values out of the set of N\ values of Rh where M/Nl = a. The sub- 

* If the existence of a trend is feared it may be preferable to use the non-oircular statistic 
discussedi for example, in [2]. 
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set of M values which constitute the critical region will depend in each particular 
problem on the possible alternatives to randomness. For example, if a linear 
trend is the only possible alternative to randomness, then the critical region will 
consist of the M largest values* of Rh . The value of the lag h will also be chosen 
on the basis of the alternatives under consideration. For instance, if some 
cyclical movement in the data is suspected the choice of h will depend on the 
form of these cycles. The general idea underlying the choice of the subset of M 
values and of the lag is to make the power of the test with respect to the alterna- 
tives which are particularly feared as high as possible. 

If Rh has the same value for several permutations of (ai , • • • , ajf), it may be 
impossible to have a critical region consisting of exactly M values of Rh . For 
example, if Oi = a 2 = • • • = Ujv , then all the N\ values of Rh are equal, and the 
number of values of Rh included in the critical region must be either 0 or iV'!. If 
F{x) is continuous the probability that two values of Rh be equal is zero. This 
explains why an exact test is always possible when F(x) is continuous. On the 
other hand, if F(x) is not continuous, the probability that several values of Rh 
be equal is positive. However, the theorem we shall prove in Section 4 shows 
that in the limit an exact test is possible even when F{x) is not continuous, but 
has finite moments and a positive variance. For if the latter is true, the 
probability is one that the weaker conditions for the validity of our theorem 
(given at the end of Section 4) will be fulfilled. 

Consider the statistic 

N 

(2) Rh — ^ Xa ^h-^a 

a-1 

where Xh+a is to be replaced by Xh+a-^s for all values of a for which h a > N, 
Since in the subpopulation under consideration 23a-i Xa and Xa are con- 
stants, the statistic Rh is a linear function of Rh in this subpopulation. Hence, 
the test based on Rh is equivalent to the test based on Rh . Since Rh is simpler 
than Rh , in what follows we shall restrict ourselves to the statistic Rh . 

We shall now show that, if h is prime to N, the totality Th of the N\ values 
taken by Rh is the same as Ti , the totality of the N\ values taken by Ri . 

In the argument which follows it is to he understood that, whenever a positive 
integer is greater than N, it is to be replaced by that positive integer less than or 
equal to N which differs from it by an integral multiple of N. 

Clearly it will be sufficient to show the existence of a permutation pi , P 2 , * • • , 
Pn of the first N integers such that 

+ 1 = P»+A (i = 2, • • • , N). 

Such a permutation is given by 

j = (i - 2, • • • , N). 

For if j 9^ f then (j — l)h+l 9^ (j' — l)h+ 1 because h is prime to N. Hence 
to every positive integer i there is a unique positive integer j, (t, j < N) such 


* See footnote 3. 
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that 

i = 0* - 1)A + 1 

Now 

+ 1 = Vih-Dh+l + 1 = i + 1 = PiA+1 = Pt+A , 

which is the required result. 

In what follows we shall restrict ourselves to the case when h is prime to N» 
This is not a very restrictive assumption since in practice h will be small as com- 
pared with N and by omitting a few observations we can always make N prime 
to h. Since Th is the same as Ti we shall deal with the statistic Ri only. To 
simplify the notation we shall write R instead of Ri . Thus, the test procedure 
will be based on the statistic 

AT-l 

( 4 ) 2 XaXa^l + Xi^Xl. 

ai>"l 

If N is very small an exact test of significance can be carried out by actually 
calculating the N\ possible values of R. However, this procedure is practically 
impossible if N is not small. In Section 3 the exact mean value and variance of 
R will be calculated, and in section 4 the normality of the limiting distribution 
of R will be proved. Thus, if N is sufficiently large so that the limiting distribu- 
tion of R can be used, a test of significance can easily be carried out. Difficulties 
in carrying out the test arise if N is neither sufficiently small to make the computa- 
tion of the Nl values of R practically possible, nor sufficiently large to permit the 
use of the limiting distribution. In such cases it may be helpful to determine 
the third and fourth, and perhaps higher, moments of R, on the basis of which 
upper and lower limits for the cumulative distribution of R can be derived. 
(For a description of the Tchebycheff inequalities by which this can be done see, 
for example, Uspensky, [8], pp. 373-380.) Since the limiting distribution is 
normal it may be useful to approximate the distribution by a Gram-Charlier 
series or to employ similar methods. 

3. Mean value and variance of R.^ It is clear that 

E{R) = NEixix,) - EE a.af 

( 5 ) 

= jy _ | [(Ol + • * • + ««•)* — (<*1 + • * • + OW)]. 

To calculate the variance of R we first calculate the second moment of R about 
the origin. We have 

(6) E(R^) * E(xix% 4“ • • • + Xjf^iXjf + XfiXi)^ 

« NExix\ + 2NEz\^Xi + (N* — ZN)ExiXiX^i • 

* The first four moments of a similar statistio have been obtained by Young [3]. 
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To express the expected values Ex\:i^ , Exix\xi , and ExiXiX^t we shall introduce 
the following notations for the symmetric functions of Oi , • • • , : For any 

set of positive integers ii, it, ••• , ik the symbol denotes the sym- 
metric function X)<»* ‘ • • where the summation is to be taken 

over all possible sets of k positive integers ai , • • • , a* subject to the restriction 
that au < N and a« a, (m, r = 1, • • • , k). 

From (6) we easily obtain 

N „ . 2N 




N(N - 1) 


Sti -l* 


( 7 ) 


Sn 


+ 


N(N - 1)(JV - 2) 

+ 

2iSi21 




N^-3N 


N(N - l)(iV 

1 SlUl 


2)(iV - 3) 


<Suu 


(N - 1) ' (N - 1)(N -2) ' (N - 1)(N -’2)' 

It will probably facilitate computation to express each of the symmetric func- 
tions in the right member of (7) by a sum of terms, each a product of factors 
Sr(r = 1,2, • • • )• One can easily verify the relationships 

-Su = Si 


( 8 ) 

(9) 

( 10 ) 
( 11 ) 
( 12 ) 


S2 


Su = Sii = S1S2 - S» 

Sa = Sai = SiSt — St 

Sit = Si - St 

Sm = SnSt - 2Stt = (S? - St)Si - 2{S^St - St) 

= Si- SSiSt + 2St 

(13) Sm — Sjai = Sail = SnSt — 2Su 

= (Si - St)St - 2{StSt - St) 

= SlSt - Si - 2StSt + 2St 

(14) /Sini = 5iiiSi — 3Siij 

= 5 i — 3 iSiS 2 -j- 21S11S3 ~ 3S1S2 "t* 3S2 -j- 6 <SiSs — 61S4 
= Si — OSiSa -J- 8SiSt d- 3/S| — 61 S 4 . 

It follows from (5) that * 

1 ^ct2 


(15) 


E(R)^ 


(5? - St), 


N - 1 

and from (7), (11), (13), (14), and (15) that the variance of R is given by 
AR) = E(F^) - [E(R)f 


(16) 


:^t-St,Sl- 4SlSt + ^SiSt + Sl- 2St 


N - 1 


(N - 1)(N - 2) 


(N - D* 


(Sl-Stf. 



TEST FOR RANDOMNESS 


383 


The mean value and variance of R can easily be computed from (15) and (16) 
as soon as the values of Si , S 2 , < 83 , and S 4 have been determined. 

The formulas (15) and (16) are considerably simplified if Si = 0. In the 
special case that Si = 0 we have 

(150 E{R) = -jtAt 


and 


(160 


<r\R) 


N - 1 {N - l)(Ar - 2) “ {N - 1)** 


We can always make Si equal to zero by replacing Oa by ha = a^ — N~^ S Oa . 
This substitution is permissible, since it changes the statistic R only by an addi- 
tive constant and consequently leaves the test procedure unaffected. Thus, in 
practical applications it may be convenient to replace Oa by ha and to use formu- 
las (150 and (160- 


4. Limiting distribution of R. Let {oa) (a = 1 , 2 , ••• ad inf.) be a sequence 
of real numbers with the following properties: 
o) There exists a sequence of numbers Ai, Ai , • • • , At , • • • such that 


(17) 


N 




Ea; 

a—l 


< Ar 


(r = 1 , 2 , • • • ad inf.) 


for all N. (This condition means that the moments about the origin of the 
sequence Oi , Oj , • • • , Ow are bounded functions of N.) 

then 


(18) 


lim inf «(W) > 0 . 

N 


(This condition means that the dispersion of the N values Oi , Oi , • • • , o# is 
eventually bounded below.) 

Let /2(W) be the serial correlation coefficient R as defined in (4), where Xi , • • • , 
Xjt is a random permutation of Oi , oj , • • • ,as . We shall prove the following 
Theorem: AsN <», the prohability that 


approaches the limit 


R{N) - E(R(,N)) 





a-l** 


dx. 
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For any function f{N) and any positive function <l>iN) let 

m * om)) 


mean that | /(iV)/^(iV) | is bounded from above for all N, and let 

m) = 


mean that 


m) = om)) 

and that lim inf | f(,N)/<l>(,N) | > 0. Also let 

K 

m) = oMN)) 

mean that 


lim 

Ar-»oo 


m 

HN) 


0 . 


Let [p] denote the largest integer less than or equal to p. 

To simplify the proof we shall temporarily assume: 

c) There exists a positive constant K such that, for every positive integral N, 


(19) 


-a: < -Si = 


a-1 


This restriction will be removed later. 

Lemma 1: 

Z ••• Ea..o.. ••• 0«1 = 0(Ar»*>). 

Proof: Z ’ • ' Z ®<»i ' ' • ®<»* can be written as the sum of a finite 

number of terms where each term is a product of factors Sr (r = 1, 2, • • • ). 
This representation will be called the normal representation of Z • • • Z ®<»i • ’ ' 
o„j . Since = 0(1) by (19) and Sr — P(,N) by (17) and since the number of 
factors Sr (r > 1) in a single term of the normal representation of Z ' • • Z 
• • • Oa* is at most [JA:], the equation Z • ' • Z o«i ' • • ®a* = OfAT^**’) must 
hold. 

Lemma2: Lety = Xi ' • • xia, wherez = x*44 • • • xj^andij >10'=1>“‘>»‘)- 
(*i > ' • • > ®iv) o random permvtation of a^, as , and if k, r, ii , ir 
are fixed values independeni of N, then E(y) = 0(1V^**^“*). 

Proof: Let E(y | x*+i , • • • , Xk+r) be the conditional expected value of y when 
, ■ * • , Xk+r are fixed. It follows easily from Lemma 1 that 

E(lf\Xk+l,--’, Xk+r) = 0(iV‘*‘’"*). 

Hence also E(y) = 0(iV***'~*) and Lemma 2 is proved. 
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Denote XaXa-tri by ya{oL == 1, • • • , — 1) and XsXi by , and consider the 

expansion of (i/i + • * • + yNV- Let i/ be a term of this expansion, i.e., y = 

N\ 

Ml — y'^i * • ' 2/«« («i < «2 < • • • < oiu). We will say that two factors !/« 
1,11 • • • 

and yfi are neighbors if|a — j8+l|orla — — l|is either 0 or N. The set of 

u factors yai , ya^ can be subdivided into cycles as follows; The first cycle 
contains yai and all those ya which can be reached from yai by a succession of 
neighboring . The second cycle contains the first ya of the remaining se- 
quence and all those which can be reached from the first ya by a succession of 
neighboring ya . The third cycle is similarly constructed from the remaining 
sequence, etc. After a finite number of cycles have been withdrawn the sequence 
will be exhausted. If m is the number of such cycles we will say that y has m 
cycles. 

Lemma 3 : Let y be a term of the expansion {xiX2 + • • • + x^xiY = (2/1 + * * • 
+ yj^y (r fixed). Let m be the number of cycles in y and k be the number of linear 
factors in y if y is written as a function of Xi y Xy (i.e., if we replace ya by 
XaXa+i)- Then the maximum value of m + [^k] — k is equal to [|r]. 

Proof: First we maximize m + [^fc] — k with respect to k when m is fixed. 
If m < [|r], then the minimum value of k is obviously zero. Let m = [§r] + r' 
(r' > 0 ). The minimum value of k is reached if each cycle consists of a single 
factor ya and if each factor ya in y is either linear or squared. If r is even, then 
the minimum value of k is 4 r' and if r is odd then the minimum value of k is 
4 r' — 2. Hence for m ==[ir] + r' we have 

max (m + [^A;] — /c) = [|r] — r' if r is even 

h 

and 

= [Jr] — r' + 1 if r is odd. 

Hence maximizing with respect to m and k we obtain 

max (m + [p] — k) = [§r], 

and Lemma 3 is proved. 

Lemma 4: The expected value of the sum of all those terms in the expansion of 
{xiX2 4- . * . + xsXif for which m is the number of cycles and k the number of linear 
factors (if y is expressed in terms of xi ^ ^ xy) is equal to 

This Lemma follows from Lemma 2 and the fact that the number of terms y 
with the required properties is 0(iV"*). 

Lemma 5: 

E(XiX2 + • • • + XyXiY = 0(N^^^^). 

This follows from Lemmas 3 and 4 . 

Lemma 6: If r is even then 

' “ (w) 


E(XiX2 -f- • • • + XyZlf 
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Proof: It follows easily from our considerations in proving Lemma 3 that 
+ [ifc] — for all terms in the expansion of {xiX 2 + • • • + x^XiY which 

are not of the type x? • • • x? . Hence it follows from Lemma 4 that the expected 
value of the sum of all those terms in the expansion of [xix^ + • • • + XmX^ 
which are not of the type xl • • • xl is equal to o(iNr*0* Lemma 6 follows from the 
fact that 2‘’*V! is the coefficient of the terms of the type xl ••• xl in the expansion 
of (xiX 2 + * • • + Xf^XiY and that the number of terms of such type is equal to 
C^r . 

Lemma 7. Lim ' v - " vsLtr ® ^ 2“*'r!/(ir)! if 

\£j{XiX2 + ••• + XsXxY)^^ 

r is even. 

Proof: From Lemma 6 it follows that 


(20) E{xiX 2 + • • • + x.yXiY = NEixlxl) + o{N) = QiN). 

The first half of Lemma 7 follows from Lemma 5 and equation (20). 
is even then it follows from (20) that 


lim 


E{XiX2 + 


( 21 ) 


{E{XiX2 4* 


+ - lim 

+ Xi/Xi) r 


= lim 


2-‘"Cf,r! E{xl ••• xl) 
N^'iExlxl)^' 


r! 


E(xl 


xl) 


2*'(^r)! {Eixlxl))^' 


If r 


It follows from (17), (19), and the normal representation of symmetric func- 
tions that 

ik! Z • • • Z a:, • • • = Si + 0(iV*-‘). 

From (17) and (18) we have S 2 = 2{N). Since 
Eixl^-- xl) = rlCE ••• Za-i ••• ol)[W - 1) ••• (AT - r+ 1)]-’, 

<*«!■< ®a2^ • • • 


we obtain 


(22) 

lim - 1 

{Eixlxl))^' 

The second half of Lemma 7 follows from (21) and (22). 

Lemma 8: 

(23) 

lim - 0 

jJ “ <r(R(N)) 

(24) 

E(iem _ j 
o^iRiN)) • 

Proof: Equation (24) 
0(1) and from (16) iriR) 

is a trivial consequence of (23). From (16) E(R) *= 
= fl(i\r*). The lemma follows easily from these relar 


tions. 
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Proof of the Theorem : According to Lemma 7 the r-th moment of 
approaches the r-th moment of the normal distribution as iV co . From this 
and Lemma 8 the required result follows if condition (c) holds. It remains 
therefore merely to remove condition (c). Assume now only that ai , 02 , • • • , 
Ra , • • • satisfy conditions (a) and (h). 

R{N) is formed from the population of values ai , a 2 , • • • , ajv . Addition of 
a constant g to ai , • • • , adds the same constant to all the values of R(N) and 
hence leaves [R{N) ~ E{R{N))]/(r(RiN)) unaltered. Let g^"^^ be aJN 

and write Consider the sequences 

5''^ = . . . , (i = 1, 2, . . . , ad inf.). 

From (17) it follows that the | g^^^ | are bounded for all N. Hence the se- 
quences satisfy condition (a). They obviously satisfy condition (c). Since 
b{j) is invariant under addition of a constant we have 

lim inf \(t (6">)’ b'J')’) > 0. 

i J \o-l J \o-i / / 

SO that the B'*' satisfy condition (6). Since [R{N) — E{R{N))]/a{R{N)) has 
the same distribution in the sequence Oi , 02 , • • • , Oy as in the sequence B*^\ 
the theorem follows. 

It should be remarked that the theorem remains valid if conditions (o) and 
(6) are replaced by the weaker condition 

Pr/Air = 0(1) (r = 3, 4, • • • , ad inf.) 

where 



This follows easily from the fact that [R(N) — E{R{N))]/(t(R{N)) remains un- 
altered if "ve replace the sequence Oi , • • • , Oy by the sequence cf , , • ■ • , Cy 

where 



Conditions (a) and (i») are obviously satisfied by the sequence cf , • • • , Cy . 

6. Transformation of the original observations. 

Let/(<) be a continuous and strictly monotonic function of<(— « < t < + »). 
Suppose we replace the original observations Oi , • • • , Oy by di , • • • , dy , where 
da = f(o,a) (« = !,•••, N). We obtain a valid test of significance if we carry 
out the test procedure as if di , — , dy were the observed values instead of 
Oi , • • • , ojr . We could also replace the observed values oi , • • • , Oy by their 
ranks. The question arises whether there is any advantage in making the test 
on the transformed values instead of on the original observations. It may well 
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be that by certain transformations we could considerably increase the power of 
the test with respect to alternatives under consideration. This problem needs 
further study. 

6. Summary. A test procedure based on serial correlation is given for testing 
the hypothesis that Xi , • • • , are independent observations from the same 
population, i.e., that Xi , • * • , is a random series. By considering the dis- 
tribution of the serial correlation coefficient in the subpopulation consisting of 
all. permutations of the actually observed values a test procedure is obtained 
such that 

a) if the common c.d.f. F(x) is continuous, the size of the critical region, 
i.e., the probability of rejecting the hypothesis of randomness when it is true, 
does not depend upon F(x), 

b) if F(x) is not continuous but all its moments are finite and its variance is 
positive, the size of the critical region approaches, as iNT — > op, the value it 
would have if F(x) were continuous. Thus in the limit an exact test is pos- 
sible in this case as well. 

It is shown that the test based on the serial correlation with lag h is equivalent 
to the test based on the statistic® 

AT 

a-1 

where Xh^a is to be replaced by Xh^a^N for all values of a for which /i + a > iV. 
If h is prime to N, the distribution of XaXh+a is exactly the same as the dis- 
tribution of J? = ^1 XaXl+a . 

The mean value and variance of R are given by the following expressions: 
E(R) * (Si - S2)/(N - 1) 

and 

2 /m _ S2 ^4 , — 4:SlSt + ^SlSz + S2 — 2S4 _ (Si — S2)* 

JV - 1 (AT - 1 )(JV - 2) (N - 1)» 

where Sr == xl -f- • • • + . 

It is shown that under some mild restrictions the limiting distribution of R is 
normal. The test procedure can therefore be easily carried out when N is 
sufficiently large to permit the use of the limiting distribution of R. 
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ON A GENERAL CLASS OF ‘‘CONTAGIOUS” DISTRIBUTIONS 

By W. Feller 
Brawn University 

1. Introduction. In a paper of considerable interest, J. Neyman [11] recently 
discussed frequently occurring situations where the usual tests of significance 
fail. He discussed, in particular, experiences in entomology and bacteriology 
which cannot be described by the usual distribution functions and he constructed 
several new types of apparently contagious distributions. Now at first glance 
Neyman’s investigation may seem of a rather specialized nature, and his distri- 
butions of a restricted applicability. It may therefore be useful to point out 
that they are intimately related to results obtained by various authors in con- 
nection with topics having so little apparent relation as accident statistics, tele- 
phone traffic, fire damage, sickness- and life-insurance, risk theory, and even an 
engineering problem. Viewed in the proper light of a general theory, Neyman *s 
method is particularly closely related to some too little known considerations by 
Greenwood and Yule [6]. These authors were the first to find, and apply, the 
distribution which shortly afterwards was independently rediscovered by Eggen- 
berger and Polya^ [3, 4]. 

Greenwood and Yule discussed two types of what may conveniently be called 
contagion: with one type there is true contagion in the sense of Polya and Eggen- 
berger, where each ‘^favorable” event increases (or decreases) the probability 
of future favorable events; with the second type the events are, strictly speak- 
ing, independent and an apparent contagion is actually due to an inhomogeneity 
of the population. The two explanations are very different in nature as well as 
in practical implications. It is therefore most remarkable that Greenwood and 
Yule found their distribution assuming an apparent corUagion; in their opinion 
this distribution contradicts true contagion. On the contrary, Polya and Eggen- 
berger arrived at the same distribution assuming true contagion, while the possi- 
bility of an apparent contagion due to inhomogeneity seems not to have been 
noticed by them. The Greenwood-Yule-Polya-Eggenberger distribution has 
found many applications.* Therefore the possibility of its interpretation in two 
ways, diametrically opposite in their nature as well as in their implications is of 
greatest statistical significance. This fact is, incidentally, a justification for 
general theories in statistics. 

We shall see that Neyman’s contagious distributions belong to the second 
type and are related to the Polya-Eggenberger distribution only if the latter is 

^ The fact that the Polya-Eggenberger distribution is identical with the Greenwood-Yule 
distribution seems to be mentioned in the literature only in a Stockholm thesis by O. Lund- 
berg [9]. 

* Of quite recent applications we mention Kitagawa and Huruya [8], Rosenblatt [16], 
O. Lundberg [9]. Only the latter seems aware of the double nature of the distribution. 
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interpreted in the sense of Greenwood and Yule. In Neyman’s case as well as 
in the other cases referred to above w’e are concerned with inhomogeneous popu- 
lations and there exists an extremely simple device to describe such situations 
appropriately. Once stated, this device will appear trivial. Nevertheless, a 
straightforward application of it would have avoided considerable mathematical 
difficulties in the literature and, occasionally, yielded better and simpler results. 
It seems also the simplest description of the mechanism behind many observed 
distributions, and therefore suited for a theory of tests*. 

To start in a purely formal manner, consider an arbitrary cumulative distri- 
bution fimction (c.d.f.) F(x, a), depending on a parameter a, and another c.d.f. 
U(a). Then 

(1.1) G(x) = f Fix, a) dUia) 

(the integration extending over the domain of variation of o) is again a c.d.f. 
If, in particular, J7(o) is a step function, (1.1) reduces to 

(1.2) Gix) = 2p,F(x, Oi), 

where p< is the weight attached to a< (we have, of course, p,- > 0, 2p,- == 1). 
Instead of (1.2) one can write more simply 

(1.3) Gix) * XpiFiix), 

where the F<(x) are arbitrary c.d.f.’8. Of course. Fix, o) and Via) may depend 
on additional parameters, and the procedure can be repeated. 

The statistical meaning of (1.3) is clear. Consider a population made up of 
several subgroups Ai, A 2 , , mixed at random in proportions pi:p»: • • • . 

If Fiix) is the c.d.f. of some character in Ai , then Gix), as defined by (1.3), will 
represent the c.d.f. of that character in the total population, provided that the 
subgroups Ai are statistically independent. Similarly (1.1) describes an infi- 
nitely composite population. Postponing a discussion of the property of con- 
tagion to the last section, we shall first deduce a few properties of the compound 
Poissofirdistribution, considered first by Greenwood and Yule. Neyman’s 
“Contagious Distributions of Type A” as well as the Polya-Eggenberger distri- 
bution belong to this class. Our next example of a special case of (1.1) is what 
F. E. Satterthwaite [16] called the “Generalized Poisson Distribution.” It has 
been independently discovered by many authors and represents heterogeneity 
of quite different a natiure. Instead of further examples we shall, in the fourth 
section, show how Neyman’s most general contagious distribution can be de- 
duced by a repeated application of (1.1). 

* Incidentally, attention may be drawn to an argument by Greenwood and Yule showing 

that the x**test when applied to the Poisson distribution is biased and tends to exaggerate 
the goodness of fit. The argument could be amplified from other experience. 
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Notation: If F(x) and G(x) are the c.d.f.’s of two independent variates X and 
Y, then their convolution, (that is to say the c.d.f. of X + 7) will be denoted by 
F{x)*G{x). Thus 

(1.4) F{x)*G(,x) = F(x - y) dG{y). 

More particularly we shall write 

(1.5) F{x)*F{x) = F^\x), 

F’'\x)*F{x) = ^^"+”’( 0 :). 

We shall denote by E{x) the unitary c.d.f. 

= {? to X > 1: 

SO that = 0 for X < n, and 1 for a; > n. 

2. The compound Poisson distribution. Consider the well-known Poisson 
expression 

(2.1) x(n;a) = e-^j, 

where the parameter a > 0 gives the expected number of ‘'cvent8’\ We shall 
refer to (2.1) as the simple Poisson distribution. If different individuals of a 
population are associated with different values of a, and if the character a is 
distributed according to the cumulative probability law U{a), the probability 
of n events in the total population will be given by 

(2.2) Tn r e-'-,dU{a). 

Jo nl 

Following Greenwood and Yule we shall refer to (2.2) as the compound Poisson 
distribution. Referring for an interpretation to the last section, we first con- 
sider a few special cases. 

a) If U (a) is a step function we are led to expressions of the form 

(2.3) Tn = 

Such a distribution has been successfully applied by C. Palm [12] to problems of 
telephone traffic, ,and by O. Lundberg [9] to sickness statistics. 

b) If U(a) is a Pearson Type III distribution 



(2.4) 
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(with d > 0, A > 0), then 



This is the Polya-Eggenberger distribution in its usual form, and has in this form 
(with a slight change of notations) been derived by Greenwood and Yule. 

c) If a takes on the values kc only, where c > 0 is a constant and A; » 0, 
!,•••, and if a is distributed according to the Poisson law 

\ 

(2.6) Prob {a = A»} = ^ , 

then 

This is Neyman’s contagious distribution of type A depending on two parameters 
(cf. section 4). If, instead, a is distributed according to a multiple Poisson law 
of form (2.3) we arrive at Neyman's more-parametric distribution of type A. 
They are, of course, essentially linear combinations of expressions of form (2,7). 

It follows from the theory of Laplace transforms that two compound Poisson 
distributions associated with different c.df.’s Uifl) are never identical. 

The compound Poisson distribution gives a simple explanation of a phenome- 
non recorded by Neyman and observable in many instances. In the experi- 
ments described by Neyman “the attempts to fit the Poisson Law • • • failed 
almost invariably with the characteristic feature that, as compared with the 
Poisson Law, there were too many empty plots and too few plots with only one 
larva”. It is easily checked in the literature that similar situations arise fre- 
quently. Now the Poisson distribution is usually fitted by the method of 
moments. Accordingly, the compound Poisson law (2.2) ought to be compared 
with the simple Poisson distribution with the same mean value. The mean 
value of (2.2) is 

( 2 . 8 ) m^j^adUia), 

so that (2.2) ought to be compared with the Poisson distribution ir(n; m). Now, 
whatever the e.df. Via), we have always 

(2.9) TO > ir(0, m) 
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As a matter of fact, using Lagrange’s form for the remainder in Taylor’s for- 
mula, we have 


( 2 . 11 ) 


TO = dU{a) 

> e"”* f {1 -H (m - o)} dU(a) = e"" = t(0, m), 
Jo 


which proves (2.9). Similarly 


( 2 . 12 ) 


mwo — TTi = 


> 



e’"^(OT - a) dU(a) 
(m — a) dU{a) = 0, 


which proves (2.10). 

The above theorem shows that, whenever the material under observation is not 
quite homogeneous so that the compound Poisson law applies instead of the simple 
one, there will be too many cases with “no event" and, as compared with these cases, 
too few with “one event”. It should be noticed, however, that it is not strictly 
true that always 


(2.13) 


TTi < ir(l, m). 


As a matter of fact, even in the numerical example given by Ncyman, the com- 
puted value Ti exactly equals the observed value. Still, the inequality (2.13) 
will hold whenever the third moment about the mean of l/(o) is smaller than 
twice the second. Writing 


(2.14) 


(r* = r (o - m)* dU{a), 
Jo 

M j (a — m)* dU{a), 


and using two more terms in the Taylor development of e” “ than in (2.11) and 
(2.12) we see that 

(2.15) T„>c-"jl-l.^*-|Af| 
and 

(2.16) nwro “ iri ^ e~’"{or* — iAf}. 

These inequalities are slightly sharper than (2.9) and (2.10), and often permit us 
to estimate the variance of U{a). 

We note furthermore that the variance of the compound Poisson distribution is 

(2.17) o* + m 
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as compared with the variance m of the corresponding simple Poisson distribution. 
Finally the following important property of the compound distribution may be 
mentioned: Consider two independent variates X and Y distributed according to 
two compound Poisson distributions {xnM cLnd {xn^} associated with the cd.f.^s 
Ui{a) and U 2 (a), respectively. Then the variate X + Y is distributed according to 
a compound Poisson law {iPn} associated with the c.d.f. U(a) = Ui{a)*U 2 ip) 
(cf. (1.4)). 

It suffices to note that Ui(a) s 0 for a < 0, so that 

U{a) = f Viia - s)dUi{s)-, 


therefore, after a permitted change of the order of integration 

= Jf dUiis) j* 6 "“ ^^dUi{a - s) 

= dUi{s) 


'T — ^ _<» 

«*!(« - k)\ ’’ 


^ n— A? 




the last expression represents the convolution of {rn^} and {TTn^}. 

Neyman’s distributions of type A with two parameters are special cases of a 
compound Poisson process where U{a) is a step function with jumps at equidis- 
tant places, the jumps being given by a simple Poisson distribution {7r(n; X)}. 
Now the convolution of two such distributions is again a simple Poisson distribu- 
tion {x(n; 2X)} with jumps at the same places; hence the convolution of two 
distributions of type A is again a similar distribution with one parameter doubled. 

As mentioned before, the notion of a compound Poisson distribution is due to 
Greenwood and Yule [6]. The time dependent compound Poisson process has 
been the object of detailed investigations by J. Dubourdieu [2] and O. Lundberg 
[9]. The latter has discussed also the problem of fitting the compound Poisson 
process to empirical distributions. 


3. The generalized Poisson distribution. Let F{x) be an arbitrary c.d.f. 
Then its n-fold convolution F"*(a;) (cf. (1.5)) may be considered as a c.d.f. 
depending on a parameter n. Choosing, for the latter, the simple Poisson dis- 
tribution (2.1) and performing the operation indicated in (1.1), we arrive at the 
c.d.f. of the generalized Poisson law 
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If, in particular, F(x) is the unitary function (1.6), we have the ordinary Poisson 
law 

(3.2) = = 

n— 0 RI n— 0 71 1 

in its cumulative form. 

The most frequently encountered application of the generalized Poisson dis- 
tribution is to problems of the following type. Consider independent random 
events for which the simple Poisson distribution may be assumed, such as: 
telephone calls, the occurrence of claims in an insurance company, fire accidents, 
sickness, and the like. With each event there may be associated a random 
variable X, Thus, in the above examples, X may represent the length of the 
ensuing conversation, the sum under risk, the damage, the cost (or length) of 
hospitalization, respectively. To mention an interesting example of a different 
type, A. Einstein Jr. [5] and G. Polya [13, 14] have studied a problem arising out 
of engineering practice connected with the building of dams, where the events 
consist of the motions of a stone at the bottom of a river; the variable X is the 
distance through which the stone moves down the river. 

Now, if F{x) is the c.d.f. of the variable X associated with a single event, then 
F''\x) is the c.d.f. of the accumulated variable associated with n events. Hence 
(3.1) is the probability law of the sum of the variables (sum of the conversation 
times, total sum paid by the company, total damage, total distance travelled by 
the stone, etc.). 

In view of the above examples, it is not surprising that the law (3.1), or special 
cases of it, have been discovered, by various means and sometimes under dis- 
guised forms, by many authors. Quite recently Satterthwaite [16] was led to it 
(in the above simple form) from problems in insurance. Related (but less ele- 
gant) considerations may be found in a paper by W. G. Ackermann [1]. Simple 
as they are, the above considerations leading to (3.1) furnish a complete solution 
of the problem in all the cases mentioned. Unfortunately, the special features 
of the problems often so overshadow the essential point, that one is often led to 
unnecessarily complicated and incomplete solutions. As an example of the diffi- 
culties in considering special cases we mention that Polya [13, 14] was led to a 
partial differential equation of the hyperbolic type, which conceals the elementary 
nature of the problem. 

If F(x) is itself a Poisson c.d.f. (3.1) reduces to (2.7). Thus Neyman^s distribu- 
tion of type A depending on two parameters is both a compound and a generalized 
Poisson distribution. We shall later on see that the generalized Poisson distri- 
bution plays an even more important rdle in Neyman’s theory. 

The main properties of (3.1) are easily derived using characteristic functions. 
If <p{z) is the characteristic function of F{x)^ the characteristic function of G{x) 
is 

(3.3) ^(«) » 
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Accordingly the r-th semi-invariant of G(x) equals the r-th moment of F(x) muUi- 
pUed by a“‘. Moreover it is readily seen that the r-th corwolviion of G(x) with 
itsdf is again a function of type (3.1), only with a replaced by ra. Neyman’s 
Proposition II is a special case of this remark. 

4. Neyman's contagious distributions. As an illustration of the general 
applicability of the operation (1.1) we shall consider the typical example treated 
by Neyman. Consider the distribution of larvae in a field. The field is divided 
into plots of equal areas and we are interested in the probability Ck that exactly 
k larvae are found in a certain plot. Now we assume with Neyman : 

(t) The larvae may come from various litters. It is assumed that the proba- 
bility that exactly v litters are represented on our plot is given by the simple 
Poisson distribution* (2.1). (ii) The probability that there are exactly n sur- 
vivors is the same for all litters and will be denoted by p(n). (m) If, in any 

particular litter, there are exactly n survivors, the probability that k of them are 
found on the plot under observation is given by the binomial distribution. We 
shall write the latter in its cumulative form 

(4.1) Bix, n, u) = E (^) «*(1 - «)"-* E^\x), 

(cf. (1.6)). (tV) The parameter u in (4.1) is characteristic for any particular 
litter (and varies, in particular, with the position of the litter relative to the par- 
ticular plot under observation). The c.d.f. of u (which characterizes the distri- 
bution of litters in the field) is supposed to be known and will be denoted by F{u) . 
The litters are statistically independent. 

Now for any particular litter the probability that at most k survivors will be 
in the plot under observation is given by 

(4.2) L(*, «) = E Pin)B(k, n, u), 

which is a special case of (1 .2) . Here u is the parameter for the litter picked out. 
Accordingly, the probability that at most k survivors from any one litter will be 
found on our plot is 

(4.3) L(k) - j[' L(k, u) dF(u), 

and this is the second application of the operation (1.1). Since any number of 
litters may be represented on our plot, the final expression for the probability 

^ Actually Neyman at first assumes the number of litters in the field to be finite and 
considers therefore the binomial instead of the Poisson distribution. Later, however, a 
passage to the limit is performed which is equivalent to the above assumption. It will be 
seen that in the following consideration the Poisson distribution may be replaced by any 
other distribution. 



“contagious” distributions 


397 


that at most k larvae will be found on our plot is obtained in the form of a 
generalized Poisson c.d.f. 

(4.4) C(fc)= 

n-O m 

This is the desired c.d.f. For the desired probability ct we have ct = C'(fc) — 
dk - 1). 

We specialize now with Neyman the assumption (n) to the effect that the dis- 
tribution function {p(n)} is a Poisson distribution 

(4.5) p(n) = e"^^j. 


The distribution (4.2) then becomes the c.d.f. of a generalized Poisson distribu- 
tion, since B(x, n, u) = fi"*(x, 1, u). 

The simplest special case arises when all litters are characterized by the same 

value of the parameter, say u — ua. Then F{u) = E ■^(^) = “o). 

Writing L'Qc) = L(k) — L(k — 1) for the probability that exactly k survivors 
from any one litter will be found on our plot, we have 

L'(k) = I) -««)"■* 

(4.6) 

_ -x«. (X«o)‘ 


The c.d.f. (4.4) then reduces to the form (2.7). Similarly, when F{u) is a step 
function we arrive at Neyman’s more parametric distributions of type A. 

IfF(«) = «forO<tt<l (rectangular distribution), then jB(A:, n, w)dP(M)has‘ 

only jumps of magnitude l/(n + 1)> and 


This leads to Neyman’s function of type B. The characteristic function of 
(4.7) is readily seen to be 




1 - 1 
X €*•-! ’ 


so that the characteristic function of the final c.d.f. C{k) becomes 

in agreement with Ne}anan’s formula. 
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6. The nature of contagion. It is well known that the simple Poisson dis- 
tribution describes mutually independent events; in other words, with a Poisson 
distribution the numbers of events in two non-overlapping time intervals are 
uncorrelated and the occurrence of an event has no influence on the probability 
of occurrence of further events. Accordingly, the compound Poisson process 
also applies to independent and not contagious events. With really contagious 
events (as, for example, with epidemics) the occurrence of each event increases 
(or decreases) the probability of further events. Greenwood and Yule [6] de- 
veloped a very general scheme for such events but, due to the very generality, 
their formulas became too complex for practical applications. They considered 
the compound Poisson process, and, in particular, the Polya distribution (2.5), 
as an alternative hypothesis. Accordingly, they interpreted the good fit of that 
distribution to accident statistics as indicating that there was no contagion but 
that proneness to accidents varies with the person. 

Considering a very similar problem, Polya and Eggenberger were later on led 
to consider a special model of true contagion. This turns out to be the simplest 
case of the general Greenwood-Yule scheme, but this had been overlooked by 
them. Curiously enough, Polya was led exactly to the distribution (2.5) which 
Greenwood and Yule found as an alternative to contagion. It is therefore seen 
that, contrary to a wide-spread opinion, an excellent Jit of Polya* s distribution to 
observations is not necessarily indicative of any phenomenon of contagion in the 
mechanism behind the observed distribution. In order to decide whether or not 
there is contagion, it is not sufficient to consider the distribution of events, but 
a detailed study of the correlation between various time intervals is necessary.® 

The double interpretation of Polya’s distribution leads to an understanding 
of the compound Poisson distribution. To the observer the compound Poisson 
distribution will always appear contagious** ] however, this contagion is not in- 
Jierent in any phenomenon in nature, but simply in our method of sampling. As a 
matter of fact, with a compound Poisson distribution the parameter a is a ran- 
dom variable.® Its a priori c.d.f. in the total population is Prob {a < x} = U{x). 
Now if, for any particular sample, the observed number of events is n, then the 
a posteriori c.d.f. of a in that sample is given by 


Prob {a < x] 




* For such studies cf. Newbold [10] and Lundberg [9]. For some generalizations of the 
Polya-Eggenberger scheme see Kitagawa [17] and Rosenblatt [15]. 

* It will be noticed that here a is actually a random characteristic in the population and 
can be sampled. We are therefore not guilty of the absurdity which is usually connected 
with the unfortunate use of Bayes ^ theorem, when a constant is regarded as random vari- 
able. If the output of a machine is distributed according to a Poisson distribution, its 
parameter is a constant, characteristic of that machine. Regarding it as a random variable 
means to consider the collective of non-existing similar machines and making predictions 
for them, whereas we are interested in the one machine only. 
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This is additional information enabling us to make better predictions for the 
future or estimates of other properties of the sample. For example, if n is very 
large, there is a considerable probability that the mean of a in the sample exceeds 
that of the total population: accordingly, we shall expect that also in the future 
the number of events in our sample will be comparatively large. In other 
words, although the events themselves are strictly independent we have an 
apparent contagion due to our method of observation. 

It is hardly necessary to point out that the contagion studied by Neyman is 
of the type just described. Any inhomogeneity of a population of type (1.1) 
will lead to such an apparent contagion. However, that the Polya-Eggenberger 
distribution is a member of our class of contagious distributions must be regarded 
as accident and due to the possibility of its being interpreted as a compound 
Poisson distribution. 

• REFERENCES 

[1] W. G. Ackermann, ‘‘Eine Erweiterung des Poissonschen Grenzwertsatzes und ihre 

Anwendung auf die Risikoprobleme der Sachvcrsicherung,” Schriften des Maihe- 
jnatischen Instituts und des Instituts fur Angewandte Mathematik, Univ. Berlin, 
Vol. 4 (1939), pp. 211-255. 

[2] J. Dubourdieu,^ “Les fonctions absolumcnt monotones et la th^orie math^matique de 

I’assurance-accidents,” Comptes Rendus de VAcad. Sc,, Paris, Vol. 206, (1938), 
pp. 303-305, 556-557. 

[3] F. Eggenberger, ‘^Die Wahrsclieinlichkeitsansteckung,’^ Mitteilungen der Vereini- 

gung Schweizerischer Ver sicker ungs^Mathematiker, 1924, pp. 31-144. 

[4] F. Eggenberger and G. Polya, “tJber die Statistik verketteter Vorgange,^^ Zeitschrift 

fiir Angewandte Mathematik und Mechanik, Vol. 1, (1923), pp. 279-289. 

[5] A. Einstein, Jr,, Der Geschiebetrieb alsWahrscheinlichkeitsproblem,^^ Mitteilungen 

der V ersuchsanstalt fiir Wasserbau an der Eidgenossischen Technischen Hochschule, 
Zurich, 1937, pp. 3-112. 

[6] M. Greenwood and G. Udny Yule, ‘‘An inquiry into the nature of frequency dis- 

tribution representative of multiple happenings with particular reference to 
the occurrence of multiple attacks of disease or of repeated accidents,” J. Roy. 
Stat. Soc., Vol. 83 (1920), pp. 255-279. 

[7] T. Kitagawa, “The limit theorems of the stochastic contagious processes,” Mem. 

Faculty of Sc., KyUsyO Imperial University, A, Vol. 1, (1941), pp. 167-194. 

[8] T. Kitagawa and S. Huruya, “The application of the limit theorems of the conta- 

gious stochastic processes to the contagious diseases,” Mem. Faculty of Sc., 
KyUsyii Imperial University A, Vol. 1 (1941), pp. 195-207. 

[9] O. Lundberg, On Random Processes and their Application to Sickness and Accident 

Statistics, Thesis, University of Stockholm, 1940. 

[10] E. Newbold, “Practical applications of the statistics of repeated events, particu- 

larly to industrial accidents,” J. Roy. Stat. Soc., Vol. 90 (1927), pp. 487-547. 

[11] J. Neyman, “On a new class of ‘contagious^ distributions, applicable in entomology 

and bacteriology,” Annals of Math. Stat., Vol. 10 (1939), pp. 35-57. 

[12] C. Palm, “ Inhomogeneous telephone traffic in full-availability groups”, Ericsson Tech- 

nics, 1937, no. 1. (Stockholm), pp. 1-36. 

^ A book by J. Dubourdieu, Thhorie de Vassurance-maladie, Paris, 1939, has been an- 
nounced, but was not available to the present writer. It presumably treats the compound 
Poisson distribution more fully than the short notes quoted above. 



400 


W. FELLER 


[13] G. Polya, “Zur Kinematik der Geschiebebewegung,** Mitteilungen der Versuchaanstalt 

jUr Wasserhau an der Eidgendssischen Technischen Hochschule, Ztirich, 1937. 

[14] G. Polya, ‘*Sur la promenade au hasard dans un r^seau de rues. Actualitka Scientifiques 

et IndiLstrielleSf No. 734, (1938), pp. 25-44. 

[16] A. Rosenblatt, ‘*Sur le concept de contagion de M. G. P61ya dans le calcul des proba- 
bilit6s. Applications k la peste bubonique au P6rou,^^ Actas Academ. Ciencias 
Lima, Vol. 3 (1940), pp. 186-204. 

[16] F. E. Satterthwaite, “Generalized Poisson distribution, “ Annals of Math, Stat, 
Vol. 13 (1942), pp. 410-417. 



ON THE CONSTRUCTION OF SETS OF ORTHOGONAL LATIN SQUARES^ 

By H. B. Mann 
Columbia University 

1. Introduction. 

An m-sided Latin square is an arrangement of m symbols into a square in such 
a way that no row and no column contains any symbol twice. Two Latin 
squares are called orthogonal if, when one is superimposed upon the other, every 
pair of symbols occurs only once. For instance the squares 

ABC a y 

B C A y a 

CAB y a 

are orthogonal. The resulting square is 

Aa B0 Cy 

By Ca Afi 

Cfi Ay Ba. 

A pair of orthogonal Latin squares is called a Graeco-Latin square. A method 
has not yet been found by which all possible sets of mutually orthogonal squares 
can be constructed. However, methods are available for constructing certain 
special sets, and although we cannot obtain all possible sets with these methods 
they 3rield a great variety of designs. 

To understand these methods we shall have to use certain fundamental con- 
cepts of the theory of numbers. In the following we shall deal therefore only 
with integers and all symbols used will denote only integers. 

Let a, b, m denote certain integers. We say 

a m b (m), 

(in words a is congruent to b modulo m) if a — & is divisible by m. 

Such congruences can be treated like equations. For instance: If a a 6 (m), 
then a db c « 6 =fc c (m), oc s 6c (m). The proofs of these statements are 
obvious from the definition oi a m b (m). 

If a m b (m), and c m d (m), then ac mbd (m), and a db c » 6 d= d (m). 
^oof: According to our definition we have 

a -^b ^ Xim a =■ 6 -H \xm 

c ^ d ^ Xjm c «■ d + X*m 

^ An expository paper presented, at the invitation of the program committee, on Sep- 
tember 12, 1943 at the Sixth Summer Meeting of the Institute, at the New Jersey College 
for Women, Rutgers University, New Brunswick, N. J. 
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ac = bd + m(\ 2 h + \id + XiX 2 m) and adhc = b=bd4* ^(Xi — X 2 ). Hence 
ac — bd and (a ± c) — (6 -b d) are divisible by m. 

We have to be more careful with the division of congruences but we shall 
prove the following rule. 

If a is prime to m and ab = ac (m) then 5 = c (7w). 

Proof: a{b — c) = Xim, by hypothesis. The left side of this equation is 
divisible by m. Since a is prime to rUyb -- c must be divisible by m. 

This rule means that we may cancel as in an ordinary equation as long as the 
cancelled factor is prime to the modulus. 

Every number is congruent to one of the numbers 0, 1, 2, • • • , m. — 1, because 
if a is any number we can find a number b such that 0 < a — bm — j < m. 

We shall now add, subtract and multiply mod m. That means we add, sub- 
tract and multiply in the ordinary way but shall always replace cveiy number 
by its smallest positive remainder. Thus for instance 

2 + 4 ^ 1 (5) 

2-4 - 3 (5). 


2* Complete sets of m-sided orthogonal Latin squares, where m is prime. 

Now let p be a prime number. We write down the following design 


0 1 
3 1 + j 

2i 1 + 2j 


p - 1 

p - 1 + i 

p - 1 + 2j = Lj\ Q < 3 < p - I 


(p - l)j 1 + (p - l)i • • • p - 1 + (p - \)j 


where all expressions are to be taken mod p, that is we replace every number in 
this square by its smallest remainder mod p. We shall show that Lj is a Latin 
square. Here the rows and columns are numbered from 0 to p — 1. Assume 
that the Ath row (0<A;<p — 1) contains a number twice. Then we would 
have 


a + kj ^ b + kj (p) with a ^ b (m). 

But from this we obtain a = b (p), which is a contradiction. Now assume that 
a column contains a number twice. Then ,we would have 

a -|- s a + ft'j (p), with k ^ k* (m) 

but from this we have 

kj S k'j (p), 

and since j is prime to p 

A: F (p), 


which is again a contradiction. 
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We can obtain p — 1 such Latin squares corresponding to the p — 1 values 
which j can take. 

We shall show that Li is orthogonal to L/ if i j. If this were not true we 
would have the same pair of numbers occurring in two different boxes of the 
square which results from the superimposition of L< on Ly . Let mn be such a 
pair and assume that it occurs in the ath row and |9th column and the yth row 
and 5th colqmn of the resulting square. Then m would occur in Ly in the ath 
row and jSth column and in the -yth row and 5th column. Hence we would have 

(i) |S + a m SB 5 + Yj (p), 

and similarly 

(iO /3 + <«‘^»**5 + y» (p). 

If we subtract the second congruence from the first we obtain 

o(j - *■) = yO - t) (p), 

but j < p and i < p and j i. Hence j — t ^ 0 (p) and we may therefore 
divide by (j — i). This gives 

a a* Y (p)- 

Substituting this in (i) we obtain 

/3 = 5 (p). 

Hence the two boxes must be the same. We have therefore the following 
theorem: 

Theorem 1: If p is a prime number and 


0 1 
J 1 + i 


p - 1 


p - 1 + i 


(p - l)i 1 + (p - l)j 


p - 1 + (p - l)j 


then Li , Lj , • • • , Lp-i is a set of p — I orthogonal Latin squares. 

As an application we can write down a set of 4 orthogonal Latin squares of 
side 5 


Li 

Lt 

U 

Li 

0 12 3 4 

0 12 3 4 

0 12 3 4 

0 12 3 4 

1 2 3 4 0 

2 3 4 0 1 

3 4 0 1 2 

4 0 12 3 

2 3 4 0 1 

4 0 12 3 

1 2 3 4 0 

3 4 0 1 2 

3 4 0 1 2 

1 2 3 4 0 

4 0 12 3 

2 3 4 0 1 

4 0 12 3 

3 4 0 1 2 

2 3 4 0 1 

1 2 3 4 0 


A further simplification can be achieved if we know a primitive root mod p. 
A primitive root is a reminder a mod p such that every other remainder except 0 



404 


H. B. MANN 


is equal to a power of a mod p. For example, 3 is a primitive root mod 7, for 
3® s 1 (7), 3* s 3(7), 3* s 2(7), 3® s 6(7), 3" s 4(7), 3® s 6(7). 

For any number a we must have = 1 (p). We will prove this equation 
for primitive roots only, since we do not need the general case. Let a be a primi- 
tive root and assume that 

s 6 s a® (p), with g < p — 1. 

Then we would have 

s s 1 (p), with p' < p — 1. 

Hence we can obtain at most p — 2 different remainders aV, • • • , and a 
would not be a primitive root. 

We now form 


L, = 


„o+* 


J+i 


1 -}- a 

1 *j“ ft 


o+» 

i+t 


p - 1 


p - 1 + ft 
p — 1 -t- ft 


O-f* 

1+i 


(i = 0, 1, • • • , p - 2) 


^p-2+i j 


• • • p — 1 ft 


Exactly as in the case of the Lj of Theorem 1 it can be shown that Li is orthog- 
onal to Lj if i 7 ^ j. For /: < p — 1 the /b-th row of L, equals the {k — l)st 
row of Li+i and since = 1 (p) the last row of Li+i equals the first row of L< . 
Hence Li^i is obtained from L»- by a cyclical permutation of the (p — 1) last 
rows. It is then only necessary to construct the first square. The others can 
be obtained by a cyclic permutation of the (p — 1) last rows. We shall exem- 
plify this by constructing a set of 6 seven-sided orthogonal squares. 




Li 






Lt 






Li 



0 

1 

2 3 4 

5 

6 

0 

1 

2 

3 4 

6 

6 

0 

1 

2 

3 4 

5 

6 

1 

2 

3 4 5 

6 

0 

3 

4 

5 

6 0 

1 

2 

2 

3 

4 

5 6 

0 

1 

3 

4 

5 6 0 

1 

2 

2 

3 

4 

5 6 

0 

1 

6 

0 

1 

2 3 

4 

6 

2 

3 

4 5 6 

0 

1 

6 

0 

1 

2 3 

4 

5 

4 

5 

6 

0 1 

2 

3 

6 

0 

1 2 3 

4 

5 

4 

5 

6 

0 1 

2 

3 

5 

6 

0 

1 2 

3 

4 

4 

5 

6 0 1 

2 

3 

5 

6 

0 

1 2 

3 

4 

1 

2 

3 

4 5 

6 

0 

5 

6 

0 1 2 

3 

4 

1 

2 

3 

4 5 

6 

0 

3 

4 

5 

6 0 

1 

2 



Li 






Li 






Li 



0 

1 

2 3 4 

5 

6 

0 

1 

2 

3 4 

5 

6 

0 

1 

2 

3 4 

5 

6 

6 

0 

1 2 3 

4 

5 

4 

5 

6 

0 1 

2 

3 

5 

6 

0 

1 2 

3 

4 

4 

5 

6 0 1 

2 

3 

5 

6 

0 

1 2 

3 

4 

1 

2 

3 

4 5 

6 

0 

5 

6 

0 12 

3 

4 

1 

2 

3 

4 5 

6 

0 

3 

4 

5 

6 0 

1 

2 

1 

2 

3 4 5 

6 

0 

3 

4 

5 

6 0 

1 

2 

2 

3 

4 

5 6 

0 

1 

3 

4 

5 6 0 

1 

2 

2 

3 

4 

6 6 

0 

1 

6 

0 

1 

2 3 

4 

5 

2 

3 

4 5 6 

0 

1 

6 

0 

1 

2 3 

4 

6 

4 

5 

6 

0 1 

2 

3 
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In the theory of numbers it is shown that a primitive root exists for every 
prime number. If p is not too large a primitive root can easily be found by trial 
and error. We give a list of primitive roots for all primes under 30 ; 


Prime number 
3 
5 
7 
11 
13 
17 
19 
23 
29 


Primitive root 
2 
2 
3 
2 
2 
3 
2 
5 
2 


In computing the first row of the first square it is not necessary to actually com- 
pute all powers of the primitive root. We can take advantage of the fact that a 
congruence may be multiplied by a number. Thus, for instance, for the first 
row of the 11-sided square wc have 2“=1(11) 2' = 2(11) 2^ = 4 (11) 2’s 

8 (11) 2" 3 5 (11) 2* = 2.5 3 10 (11) but 10 s -1 (11), hence we have 
without further computation 2*3— 239(11) 2^ 3— 437 (11) 2* 3 

-833 (11) 2® 3 -5 3 6 (11). 


3. Complete sets of m-sided orthogonal Latin squares, where m is the power 
of a prime. 

We have seen that we can always construct in — 1 orthogonal Latin squares 
if m is a prime number. We shall show how to construct m — 1 orthogonal 
Latin squares if m is the power of a prime. However, if ive need only a Graeco- 
Latin square of side m and if m is odd, then we can use the following theorem : 

Theorem 2: If m is odd, then the squares 


0 1 
1 1 + 1 

2 1 H- 2 


m — 1 
/re — 1 + 1 
/re — 1 -b 2 


m — 1 1 + m — 1 


m — 1 + m — 1 


0 1 

2 1+2 

2.2 1 + 2.2 


m — 1 
/re ~ 1 + 2 
/re — 1 + 2.2 


2(/re — 1) 1 + 2(/re — 1) 


/re — 1 + 2(/re — 1) 


are orOiogonal. 
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The proof is similar to the proof of Theorem 1 . We have to use the fact that 
2 is prime to m. 

We shall now prove the following statement: For every remainder a ^ 0(p) 
there exists another retnamder a“* such that a -oT^ = l(p). 

Proof: We form the sequence a, o*, • • • , o", • • • .. Since there is only a finite 
number of remainders, there must exist 2 values i and j such that 

«’ s a\p) 

Let i > j. Then since a is prime to p we may divide by a’. Putting i — j — d, 
we obtain 

a'~^ * o'* s l(p). 

Hence we may take o~' = o'*"* and our statement is proved. Thus we see that 
the system of remainders mod p with respect to addition as well as with respect 
to multiplication if 0 is excluded satisfies the following postulates: 

(1) For every pair of elements A, B there is defined a product A -B within the 
system such that for any 3 elements A, B and C 

A‘{B‘C) = {A'B)-C (associative law) 

The “multiplication” may be any sort of composition. For example, either 
addition or multiplication of remainder classes. 

(2) There exists a unit element 1 such that 

A'\ ** I’A =» A. 

(3) For every A in the system there exists an element A~^ such that 

A’A-^ = A-^-A « 1 . 

The unit element will be 0 if we consider the remainder classes with addition 
as composition. It will be 1 if multiplication is the composition. The inverse 
of o is —a for the additive system, for the multiplicative system. 

A system satisfying (1), (2) and (3) is called a group. The property A-B = 
B’Aia usually not postulated. If a group fulfills this condition, then it is called 
a commutative group or an Abelian group. A group can be defined by its gen- 
erating elements. For example, let G be generated by the elements P, Q with 
the relations P* = 1, Q* = 1 and PQ = Q’P. We then obtain the elements of G 
aal,P,Q, PQ, PQ*, Q*. The rules for the multiplication can be written down in 
the form of a table: 


1 

P 

Q 

PQ 

PQ* 

Q* 

p 

1 

PQ 

Q 

0* 

P^ 

Q 

PQ* 

0* 

P 

PQ 

1 

PQ 

0* 

PQ* 

1 

Q 

P 

PQ* 

Q 

P 

0* 

1 

PQ 

0* 

PQ 

1 

P^ 

P 

Q 
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By inspection one can see that taking the elements of our group as symbols 
the multiplication table forms a Latin square. For instance, if we identify P 
with 2, Q with 3, etc. we obtain from the table above 


1 2 
2 1 

3 5 

4 6 

5 3 

6 4 


3 4 

4 3 
6 2 

5 1 
2 6 
1 5 


6 6 

6 5 ■ 

4 1 

3 2 

1 4 

2 3 


We shall prove that this is generally tnie. Let the group G consist of the 
elements Ai, • • • , Am . We write down the multiplication table of the group: 


Ai At 

Ai AtAt 


A 


m 


■A.2-A. fn 




m 


Suppose this is not a Latin square. Then an element will occur twice in at least 
one row or at least one column, that is, we should have either 

A = AjAk , for i 9 ^ k 

or AjAi = AkAiy for j 9 ^ k. 

Multiplying the first equation by AJ^ on the left, we obtain Ai = Ak • Hence 
i ^ k. Similarly in the second case j == ik, contrary to our assumption. 

Two groups G and G are called isomorphic if we can map G into G in such a 
way that the mapping is not disturbed by multiplication. That is, if A is mapped 
on A and J? on JB and if AB = C and AE = C. then C must be mapped on C, 
Such a mapping is called an isomorphism. If (? = S' then the mapping is called 
an automorphism. For instance, if we consider the remainder system mod m 
with addition as composition and j is any remainder, then the mapping d = ja 
is an automorphism. For if 

a + & ^ c{7n) 

then aj + bj m cj{m) 

Some automorphisms establish a 1-to-l correspondence between the elements 
of G. For instance, in the above example if j is prime to m the correspondence 
is bi-unique (that is only one element is mapped on any element of (?) because if 

aj te ltj{m), 

and j is prime to m then 

a m b{m). 
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If j is not prime to m, the mapping would not be unique although it would still 
be an automorphism. From now on we shall consider only automorphisms 
which establish a 1-to-l correspondence between the elements of G. 

Let S be such an automorphism and denote hy the element into which A 
is mapped under the automorphism S. We put == ^4.^*, (A^V = etc. 

We also put A**® = A . We shall prove the following theorem : 

Let S be an automorphism such that S, 5*, • • • , aS* map no element into itself 
except the element 1 . Then the Latin squares 


1 A2 

Af AfAi 


Am 

(i = 0, 1, • • • , g) 




Af:A, 


are orthogonal. 

Proof: Assume that L,- is not orthogonal to Ly. Let Lij be the resulting 
square if Ly is superimposed on . Then for some k and I and some r and s 
we should have the same pair of elements in the A:th row and Ith column and 
in the rth row and sth column. That is, we should have 

(1) At% = aU. 

(2) At’ A, = At‘A. . 

By taking the inverse elements it follows from (2) that 

(3) = A7^A7^' . 


Multiplying (1) and (3) we obtain 

= aU7‘‘‘. 

Multipl 3 dng by to the left, and by Af to the right, we obtain 

A7^'Af = A7’‘‘Af. 

Since S' and S’ are automorphisms we have 

= (47*^*)*''. 

Assuming i > then 

[( 47 * 4 *)^?“' = ( 47 * 4 *)"'. 

Because of i < g, j < g we have i — j < g. By assumption therefore S'~^ can 
can leave only 1 fixed. Therefore 

(47*4*)"' = 1. 

47*4* = 1 
4r = 4* . 


Hence 
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But then also 


Ai = A,. 


Therefore r = k and I = s. Hence the two compartments of L,,- cannot be 
different and our statement is proved. 

We see therefore that we can construct a set of g + 1 orthogonal Latin squares 
if we can find a group G and an automorphism S of G such that 

5, -S*, • • • , 5* 


maps no element into itself except the unit element 


Ifg 


TO — 2 and we write 


1 At A\ 

AV AVAt • 


Af 

aUV 


Ar' 


A^^A 


S« 

2 


then the {k — l)st row of equals the A-th row of L,- and all squares may be 
obtained from Lo by a cyclical permutation of the rows. 

We shall now consider commutative groups of prime power order defined 
by the relations 

Pf = pp ==•••= P; = 1, PiPi - PjPi. 

The elements of this group G have the form 

PV ••• P^n ei , • • • , en = 0, 1, • • • , p — 1. 

We call Pi • • • Pn a basis of (?. We can easily change the basis. For instance 
if Pi Pn is a basis then also Pi , P 1 P 2 , • • • , PiPn is a basis. For every 
expression we have 

pv ••• pv = P!‘-**"-‘-(PxPar ••• (PiP»r, 

since G is commutative. Such a change in the basis defines an automorphism of 
G at the same time. For let Pi , • • • , P'n be the new basis. We can map 


into 


PJi ...p;- 

Pi“ • • • P« " . 


On the other hand an automorphism is determined if we know on what elements 
the basis elements are mapped. 

It can be shown that every such group admits an automorphism S such that 
<S, (S*, • • •, S’^~^ leaves no element fi.\ed except 1. Hence we can always con- 
struct a set of p“ — 1 orthogonal squares of side p" if p is a prime. We shall 
give these automorphisms explicitly for the groups of order 8, 9, 16, 25 and 27. 

As an example let us construct 7 orthogonal 8 sided squares. We shall use 
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the group G generated by P, Q, R where P* = Q* = = 1. We use the auto- 

morphism S where 


p* = Q Q‘ = R R^ ^ PQ. 

We then have P* = Q, P*’ = R, P*‘ = PQ, P*‘ = P'Q* = QR, P*‘ = 
e*P* = PQR, P*' = P’Q'R’ = QRPQ = PR, ^P*’ = P'P* = QPQ = P. 
If we write the elements in the order 1, P, P*, P* , • • • , P* we obtain the fol- 
lowing multiplication table: 


1 

P 

Q 

R 

PQ 

QR 

PQR 

PR 

p 

1 

PQ 

PR 

Q 

PQR 

QR 

R 

Q 

PQ 

1 

QR 

P 

R 

PR 

PQR 

R 

PR 

QR 

1 

PQR 

Q 

PQ 

P 

PQ 

Q 

P 

PQR 

1 

PR 

R 

QR 

QR 

PQR 

R 

Q 

PR 

1 

P 

PQ 

PQR 

QR 

PR 

PQ 

R 

P 

1 

Q 

PR 

R 

PQR 

P 

QR 

PQ 

Q 

1 


The other squares are then obtained by a cyclical permutation of the rows of 
this square. We now write 2 instead of P, 3 instead of Q, etc. and obtain: 


12345678 

21583764 

35162487 

48617352 

53271846 

67438125 

76854213 

84726531 


12345678 

35162487 

48617352 

53271846 

67438125 

76854213 

84726531 

21583764 


and so forth. 

For the group of order 9, generated by P, Q with the relations P* = Q* = 1 
the automorphism P* =* Q, = PQ has the property that <S, 5*, • • • , maps 
no element into itself. For the group of order 16 we have 4 basis elements 
P,Q,R, PwithP* = Q^^R^^T^ = 1 and 5 can be given by P* = Q, Q* = R, 
iT = r, r = PT. 

For the group of order 25 we have two basis elements P, Q with P‘ = Q* = 1 
and the automorphism is given by P* = Q, Q* = P*0- 

The group of order 27 is generated by P, Q, R and the defining relations are 
P* s= Q* = P’ = 1. The automorphism is given by P* =* Q, Q* = R, R‘ = P*0. 

We have now shown 

Theorem Z: Let m — p” and let G be the commviative group generated by Pi , 
• • • , P„ which satisfy the relations Pf =» P* = • • • = PJ = 1 . Let S be an 
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automorphism such that P"' ^ P if 0 < i < m — 2, P ^ 1. Then the Latin 
squares 


1 

P** 


Li = 


P 

P*‘P 

p.>+‘p 


^ * jp* * 


(i = 0, 1, • • • , m - 2), 




p,w-| + *p 


pf”i 


t 


ore orthogoncd. 
rows. 


Li is obtained from Li. 


. by a cyclical permutation of its last m 


1 


4. Remarks on the lai^est number of m-sided orthogonal Latin sqiuures, for 
arbitrary m. 

The general problem can be formulated as follows: Given a number m, what 
is the greatest number of orthogonal m-sided squares. 

It is clear that this number cannot be larger than m — 1. For we can by 
renaming the numbers of the squares always transform them without changing 
their orthogonality in such a way that the first row is 1, 2, • • • m. Hence the 
pairs 1 1, 2 2, • • • , mm, occur for any two squares in the first row of the re- 
sulting square. Hence the numbers in the first column and second row of the 
squares must be different from 1 and different from each other. But we have 
only the numbers 2, • • • , m at our disposal and these are only m — 1 numbers. 

We have shown that if m is the power of a prime m — 1 orthogonal squares 
can always be constructed by the use of groups. Hence our problem is solved if 
m is the power of a prime. Very little is known about numbers which are not 
prime powers. Tarry (Compte Rendu, 1900) has shown that no 6 sided Graeco- 
Latin square exists. It is conjectured but not yet proved that no Graeco-Latin 
square of side 4n -(- 2 exists. We shall, however, show the following: If m = 
?!*••• Pn* where pi is a prime number (p< ^ pj for i j) and if r = minimum 
pY — 1 then r orthogonal Latin squares can be constructed from commutative 
groups of order m. 

We take the group G of order m generated by ei elements of order pt , ej ele- 
ments of order p* , • • • , Cn elements of order p,, . We determine the automorph- 
isms Ti of the subgroup generated by the elements of order p.- such that T,- , 
r* , • • • , leave no element of order p,- fixed. We define then an auto- 
morphism Ti of G generated by changing the basis elements of order p« in the 
same way as they are changed by Ti and leaving the other basis elements fixed. 
Then 

« 


r - • • • fn 
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is an automorphism whose first r 
r Latin squares 


L.= 



“ 1 powers leave no element fixed. 


AVA2 


A in 
A 2 A in 

• a = 0, 1, 


Hence the 


r — 1) 




aVa. 


are orthogonal. 


TABLE I 


1 1 

1 P 

Q 

PQI PR 

QR* 

PQR« 

PR* 

QR 

PQR* 

PR* QR* 

PQR 

PR* QR* 

PQR* 

1 R 

R* 

R« 

R* 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

2 

1 

4 

3 

17 

10 

15 

20 

13 

6 

19 

16 

9 

IS 

7 

12 

5 

14 

11 

8 

3 

4 

1 

2 

13 

18 

11 

16 

17 

14 

7 

20 

5 

10 

19 

8 

9 

6 

15 

12 

4 

3 

2 

1 

9 

14 

19 

12 

5 

18 

15 

8 

17 

6 

11 

20 

13 

10 

7 

16 

5 

17 

13 

9 

18 

16 

3 

19 

10 

12 

1 

7 

6 

20 

4 

15 

14 

8 

2 

11 

6 

10 

18 

14 

16 

19 

5 

4 

20 

11 

13 

1 

8 

7 

17 

2 

12 

15 

9 

3 

7 

15 

11 

19 

3 

5 

20 

6 

2 

17 

12 

14 

1 

9 

8 

18 

4 

13 

16 

10 

8 

20 

16 

12 

19 

4 

6 

17 

7 

3 

18 

13 

15 

1 

10 

9 

11 

2 

14 

5 

9 

13 

17 

5 

10 

20 

2 

7 

18 

8 

4 

19 

14 

16 

1 

11 

6 

12 

3 

15 

10 

6 

14 

18 

12 

11 

17 

3 

8 

19 

9 

2 

20 

15 

5 

1 

16 

7 

13 

4 

11 

19 

7 

15 

1 

13 

12 

18 

4 

9 

20 

10 

3 

17 

16 

6 

2 

5 

8 

14 

12 

16 

20 

8 

7 

1 

14 

13 

19 

2 

10 

17 

11 

4 

18 

5 

15 

3 

6 

9 

13 

9 

5 

17 

6 

8 

1 

15 

14 

20 

3 

11 

18 

12 

2 

19 

10 

16 

4 

7 

14 

18 

10 

6 

20 

7 

9 

1 

16 

15 

17 

4 

12 

19 

13 

3 

8 

11 

5 

2 

15 

7 

19 

11 

4 

17 

8 

10 

1 

5 

16 

18 

2 

13 

20 

14 

3 

9 

12 

6 

16 

12 

8 

20 

15 

2 

18 

9 

11 

1 

6 

5 

19 

3 

14 

17 

7 

4 

10 

13 

17 

5 

9 

13 

14 

12 

4 

11 

6 

16 

2 

15 

10 

8 

3 

7 

18 

20 

1 

19 

18 

14 

6 

10 

8 

15 

13 

2 

12 

7 

5 

3 

16 

11 

9 

4 

20 

19 

17 

1 

19 

11 

15 

7 

2 

9 

16 

14 

3 

13 

8 

6 

4 

5 

12 

10 

1 

17 

20 

18 

20 

8 

12 

16 

11 

3 

10 

5 

15 

4 

14 

9 

7 

2 

6 

13 

19 

1 

18 

17 


We shall exemplify this by constructing 3 orthogonal squares of side 20. 
We use the group G generated by P, Q, R with the defining relations: P^ = 
Q* = 1; 1. The automorphisms are given by: P^^ = Q, = PQ, 

= R, P’’> = P, Q'^* — Q, R'^* = R. Hence T = T 1 T 2 is given by: P^ = Q, 
= TQ, R^ = R\ Therefore we have: Pf = Q,P^ = PQ, P’’* = P^Q^ = P, 
(PRV = QR\ (PR)^ = PQR\ (PRf' = PR\ (PRf* = QR, (PP)’’* = PQP*, 
(PP)^' = PP^ (PP)^’ = QR\ (PP)’” = PQR, (PP)^’ = PR\ (PP)’’" = QR\ 
(PP)*’" = PQR\ (PRf* = PR, P*’ = P^ P^ = P*, P’” = P’, R^ = R. 

We need only construct one key square if we write down the elements in the 
way in which they are written above. Then we have only to mark the end of 
each cycle. Thus in our present case we have : ^ 

1 I P, <3, PQ I PP, QP*, PQR\ PI^, QR, PQR\ PR\ QI^, PQR, PR*, QR\ 

PQP’ I R, P*. R*, P* 1 
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The vertical lines mark the cycles in which the elements are permuted by the 
automorphisms. We then write down the key square in Table I. From this 
key square we can easily obtain a set of 3 orthogonal squares by permuting the 


TABLE II 


1,1 

2,2 

3,3 

4,4 

5,5 

6,6 

7,7 

8,8 

9,9 

10,10 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2,3 

1,4 

4,1 

3,2 

17,13 

10,18 

15,11 

20,16 

13,17 

6,14 

4 

3 

2 

1 

9 

14 

19 

12 

5 

18 

3,4 

4,3 

1,2 

2,1 

13,9 

18,14 

11,19 

16,12 

17,5 

14,18 

2 

1 

4 

3 

17 

10 

15 

20 

13 

6 

4,2 

3,1 

2,4 

1,3 

9,17 

14,10 

19,15 

12,20 

5,13 

18,6 

3 

4 

1 

2 

13 

18 

11 

16 

17 

14 

5,6 

17,10 

13,18 

9,14 

18,16 

16,19 

3,5 

19,4 

10,20 

12,11 

7 

15 

11 

19 

3 

5 

20 

6 

2 

17 

6,7 

10,15 

18,11 

14,19 

16,3 

19,5 

5,20 

4,6 

20,2 

11,17 

8 

20 

16 

12 

19 

4 

6 

17 

7 

3 

7,8 

15,20 

11,16 

19,12 

3,19 

5,4 

20,6 

6,17 

2,7 

17,3 

9 

13 

17 

5 

10 

20 

2 

7 

18 

8 

8,9 

20,13 

16,17 

12,5 

19,10 

4,20 

6,2 

17,7 

7,18 

3,8 

10 

6 

14 

18 

12 

11 

17 

3 

8 

19 

9,10 

13,6 

7,14 

5,18 

10,12 

20,11 

2,17 

7,3 

18,8 

8,19 

11 

19 

7 

15 

1 

13 

12 

18 

4 

9 

10,11 

6,19 

14,7 

18,15 

12,1 

11,13 

17,12 

3,18 

8,4 

19,9 

12 

16 

20 

8 

7 

1 

14 

13 

19 

2 

11,12 

19,16 

7,20 

15,8 

1,T 

13,1 

12,14 

18,13 

4,19 

9,2 

13 

9 

5 

17 

6 

8 

1 

15 

14 

20 

12,13 

16,9 

20,5 

8,17 

7,6 

1,8 

14,1 

13,15 

19,14 

2,20 

14 

18 

10 

6 

20 

7 

9 

1 

16 

15 

13,14 

9,18 

5,10 

17,6 

6,20 

8,7 

1,9 

15,1 

14,16 

20,15 

15 

7 

19 

11 

4 

17 

8 

10 

1 

5 

14,15 

18,7 

10,19 

6,11 

20,4 

7,17 

9,8 

1,10 

16,1 

15,5 

16 

12 

8 

20 

15 

2 

18 

9 

11 

1 

15,16 

7,12 

19,8 

11,20 

4,15 

17,2 

8,18 

10,9 

1,11 

5,1 

5 

17 

13 

9 

18 

16 

3 

19 

10 

12 

16,5 

12,17 

8,13 

20,9 

15,18 

2,16 

18,3 

9,19 

11,10 

1,12 

6 

10 

18 

14 

16 

19 

5 

4 

20 

11 

17,18 

5,14 

9,6 

13,10 

14,8 

12,15 

4,13 

11,2 

6,12 

16,7 

19 

11 

15 

7 

2 

9 

16 

14 

3 

13 

18,19 

14,11 

6,15 

10,7 

8,2 

15,9 

13,16 

2,14 

12,3 

7,13 

20 

8 

12 

16 

11 

3 

10 

5 

15 

4 

19,20 

11,8 

15,12 

7,16 

2,11 

9,3 

16,10 

14,5 

3,15 

13,4 

17 

5 

9 

13 

14 

12 

4 

11 

6 

16 

20,17 

8,5 

12,9 

16,13 

11,14 

3,12 

10,4 

5,11 

15,6 

4,16 

18 

14 

6 

10 

8 

15 

13 

2 

12 

7 


rows within the cycles indicated. Because of space diflSculties we give only the 
first half of the square in Table II. 

One might hope that with other groups more than r = minimum pV — 1 
orthogonal squares might be obtained. It has been shown however that using 
any group and its automorphisms at most r orthogonal squares can be obtained. 
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A more general method based on groups is given in a recent paper (H. B. 
Mann, “The construction of sets of orthogonal Latin squares,” Annals of MaOi. 
Slot., Vol. 13 (1942)). It can be shown that also with this more general method 
no 4n + 2 sided Graeco-Latin square can be constructed. 

ADDITIONAL LITERATURE 

R. C. Bose: “On the application of the properties of Galois-fields to the construction of 
completely orthogonalised Latin squares,” SankkyU, 1939. 

“On completely orthogonalised sets of Latin squares,” Sankhyd, 1941. 



ON THE DEPENDENCE OF SAMPLING INSPECTION PLANS UPON 
POPULATION DISTRIBUTIONS 

Bt Alexander M. Mood 
University of Texas^ 

1. Introduction. The foundations of the science of quality control and 
quality determination have been laid by W. A. Shewhart [1, 2]. His ideas per- 
vade what follows, but they are too well known to require discussion here. There 
is, however, one that should be specifically mentioned, that of statistically con- 
trolled production, because it provides the justification for the basic assumption 
of this paper; When production is statistically controlled, there exists a probability, 
PiN, X), that a lot of size N will contain X defective items. Shewhart has given 
a complete discussion of assumptions of this nature. 

Sampling inspection of lots may take one of two courses: 

(a) Item inspection, in which a lot is accepted or completely inspected on the 
basis of one or more samples drawn from the lot. 

(b) Lot inspection, in which a lot is accepted or rejected on the basis of one 
or more samples drawn from the lot. 

The former has been extensively studied by Dodge and Romig [3, 4, 5]; the latter 
has received little attention, but some of the basic ideas of Dodge and Romig are 
applicable to this case also. 

In this paper the approach to the general problem of lot inspection will be 
different from that of Dodge and Romig in one important respect: The role of 
the population distribution function will be emphasized, whereas they have 
directed their attention to methods which require no knowledge of the popular 
tion distribution. Their techniques are particularly valuable when a prob- 
ability distribution does not exist, that is, when production is not statistically 
controlled. The interest here will be in the inspection of lots which may be 
regarded as having been drawn from a statistical population. After the first 
sample from the first lot has been drawn, something is known of the distribution 
of that population, and as the inspection proceeds a great body of knowledge 
may be accumulated. Here, if ever, is a real opportunity to explore and to use 
a population distribution. The very nature of inspection supplies a continuous 
flow of information about it. To neglect this information would be wasteful 
indeed. 

It is, therefore, the object of this paper to point the way to more efficient in- 
spection procedures for situations in which production is statistically controlled. 
Ilie inspection procedure will be considered to be an inferential process — on 
the basis of one or more samples, and with whatever information is available 
about the parent distribution, an inference will be made regarding the quality 

^ On leave to the War Department. 
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of those items which have not been examined. A distinction is made between 
the original lot and what remains of the lot after samples have been drawn. 
The latter is the appropriate subject of the inference, inasmuch as the quality 
of the sample is exactly known. The importance of this distinction will become 
clear in the third section of the paper. 

The subject is, unhappily, very brieOy developed. The paper contains a few 
fundamental results and some suggested proceedures that may be used to obtain 
results of more immediate practical value. Time and facilities were not avail- 
able for preparation of specific sampling plans. 

2. Notation and formulae. The conventional notations P(u), P(u | v), P(u, v) 
will be used to denote the probability of u, of u given v, of u and v, respectively. 
A lot will contain N items of which X are defective. A lot from which one 
sample has been drawn will be called an “a:-lot;” after i samples have been drawn 
it will be referred to as an “x’-lot.” The number of items in the i-th sample 
will be Hi of which X{ are defective, except that the subscript will often be omitted 
when f = 1. The number of items in an xMot will be: 

Nt = N- En< 

of which 

X* = X - i: x.- 

t-1 


are defective. 

The probability of Xi for a given x^'Mot is: 

(1) P{xi I x._i) = x\‘Ji {Ni.1 - x<_i)<’’‘-*‘Visrj 2 i’ , 

where is the binomial coeflScient, and 

= u(u — l)(u — 2) • • • (u — t; 1). 

Under this conditional distribution, the m-th factorial moment of is: 

(2) ^(xj"’ I X.-_,) = , 

and the m-th factorial moment of X{ is: 

(3) X(Xj’"’ I X,_i) = . 

Repeated application of (3) to (2) results in: 

(4) X(xj”’) = »j"•’X(X‘”•’)/X<’"^ 

In similar fashion it may be shown that: 

E (f[ x'rA = n 


^5) 
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3. Single sampling. Consider a population of lots of fixed size N such that 
the probability that a lot will contain X defective items is P(X). If x is the 
number of defective items in a sample of size n drawn from one of these lots, 
then the joint probability of x and X is: 

( 6 ) = 

The fundamental result of this paper is: 

Theorem 1. The correlation between the number of defective items in the sample^ 
X, and the number of defective items in the remainder of the lot, = X ~ a?, is 
positive, zero, or negative according as the variance, , of X is greater than, equal 
to, or less than A — A^/N, where A represents the expected value of X. 

To prove this statement, one need merely compute the covariance between 
X and Xi : 

(7) = Z a:(X - x)P(x, X) - E(x)(A - P(x)). 

x,X 

Summing first on x with the aid of (2) : 

r.x, <rx. = Z X* - ^ x)p(X) - P(x)(A - P(x)) 

which may be reduced to: 

(8) <7. vx. = ^ [«^x -( a - :i)] 

by employing the definitions of A and vx together with the relation, 

E(x) = nA/N, 

which follows from (4) on putting m = 1. 

The fact that A — A^/N is the variance of a binomial distribution with mean 
A and range N, suggests: 

Theorem 2. If X has the binomial distribution, 

(9) P(X) = 

thm X and X — x are independently distributed. 

This statement is readily verified by substituting (9) in (6), and Xi for 
X — x; a rearrangement of factors then gives: 

p(x, X,) = [Qp'd - ?)"■*] 

It is clear that additional samples drawn from such lots will have the same 
property. Thus, sampling of lots drawn from a binomial population will pro- 
vide no basis whatsoever for inferences concerning the remainder of the lot. 
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The question naturally arises as to whether distributions P(X) exist for which 

r*xi = ±1. 

Theorem 3. If 


( 10 ) 

then r»x, 
( 11 ) 


P(X) = 1, X = A, A ^OorN 

« 0 , X^A, 

l;i/ 

P(X) = p, X = 0 

= 1 - p, X = N 

= 0, X - 1, 2, . • • , X - 1, 


then r^xi = !• These are the only diatributiom which 
It is first necessary to compute 


( 12 ) 


2 








(13) 


2 

ffx, = 


(N — n)® r 2 , n 
NW N - n - I 


lead to these^ values of r,xi . 



by means of (2), (3), and (4). These, together with (8), may then be used to 
reduce the condition, r*xi = 1, to the following condition on P(X): either 


(14) 


S (X -.AfPiX) = 0, 

X 


or 

(15) E X(N - X)P(X) = 0, 

X 

whence the theorem follows at once. The distributions defined by (10) and 
(11) will be referred to hereafter as P_(X) and P+(X) respectively. 

Theorem 4. The correlation, r,x , between x and X is positive unless X is dis- 
tributed by P_(X) in which, case it is zero. 

Computing the covariance by means of (2), (3), and (4), one finds that 

(16) =“ na\lN. 

The reason for so carefully distinguishing between the x-lot and the original 
lot is now apparent. While the number of defective items in the sample is al- 
ways positively correlated with the number of defective items in the original lot 
(Theorem 4), it may be negatively correlated with the number of defective items 
in the x-lot (Theorem 1). The normal practice is to reject (or completely in- 
spect) the x-lot if the sample has an excessive niunber of defectives, but when 
the (hstribution is sharper than a binomial distribution < A — A*/N) just 
the reverse should be done. It is assumed, of course, that defective items would 
be removed from the sample during its inspection when the inspection was non- 
destructive. 
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It is clear that the basic rationale of a sampling inspection plan depends on 
the condition of Theorem 1. Having chosen a sample size n and an acceptance 
number a (defined by Dodge and Romig [1]), an a:-lot would be 

Accepted when x < a if > A — A^/N 

Rejected when x > a if <rx > A — A^/N 

but 

Accepted when x > a if ax < A — A^/N 

Rejected when x < a if <rx < A — A^/N. 

Thus, it is essential that the first two moments of the population distribution be 
known accurately enough to determine the sign of ax — (A — A^/N) before an 
efiicient inspection plan can be devised. 

4* Multiple sampling* In this section are given similar criteria for guidance 
in formulating more elaborate sampling plans. The actual computations are 
elementary and will be omitted. 

Theorem 5. The mean and variance of the number of defective items in a sample 
drawn from an x'-lot are: 


(17) 

E{xi) = mA/N 


(18) 

2 [ 2 _t_N — ni ^ ^ 

iVO) ni-l\^ 


Theorem 6. 
x'-lot are: 

The mean and variance of the number of defective items in an 

(19) 

EiXi) » NiA/N 


(20) 

iV® Ni- 1 \ - 


Theorem 7. The correlation between the numbers of defective items in the irth 
and j-th samples is: 

(21) 

1 n<n#r s / 

-m- 

Theorem 8. 
sample and the 

The correlation between the numbers of defective items in the irth 
x^-lot is given by: 

(no's r rr ir - ~ fir* \ ^ ~ 

i>i 

(23) 


i < j. 


Thus^ the correlation is always positive if the sample is part of the lot even when 
X has the distribution P-.(X), except only the case covered by Theorem 4 when 
j « 0. The correlations (21) and (23) will be positive or negative in accordance 
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with the condition of Theorem 1 . The extreme values of all these correlations 
are again given by the distributions P~(X) and P+(X) defined in Theorem 3. 
When P(X) == P+{X), they all become plus one; when P(X) = P~(X), they 
become: 

(24) 

(26) 

(26) 

For i = 
rem 3. 

6. Formulation of inspection plans. In practice, the formulation of specific 
sampling inspection plans would naturally begin with the exan\ination of a 
preliminary sample* (or samples) in order to estimate the first two moments of 
the population distribution. It would then be convenient to have some simple 
standard functional form which could be fitted to the distribution by means of 
these first two moments. Such a standard form must obviously contain two 
arbitrary parameters and should represent a discrete distribution with range N. 
The simplest function known to the author which satisfies these conditions is: 

(27) Pi(X) = 

But it will be seen that this distribution is always sharper than the binomial 
distribution with the same range and mean. Hence a second form is suggested, 

(28) P2iX) = (C + X)^^\D + iV - X)'^-'V(C' + D + N+ I)''"’ 

which, it turns out, is always flatter than the binomial distribution with the 
same range and mean. It is proposed that these two functions be used as 
standard forms in the belief that the simplicity of their functional form is a 
convenience which outweighs the inconvenience of having to study two separate 
functions. 

The factorial moments of these distributions are: 

(29) 2 x:*"’ Pi(-X’) = C‘’"V(C' + £>)'"” 

0 

(30) s X'”’ P*(X) - + m)‘"V(C' + D + m + 1)‘"> 

0 

The variances are: 

(31) f: (X - AfP,(X) 

0 

(32) i: (X - A)*P,(X) 

0 


NCD(C -\-D - N) 

(C + DYiC +D-1) 

N(C + 1)(D + 1)(JV + C + D + 2) 
(C + D + 2)*(C + D + 3) 


rxixj = — y/ninj/(N — ni)(N — rij), 

TxiXi = Vn.(iV - Ni)/Ni{N - m), i > j 

= - VniNi/{N - n.)(X - N,), i < j 

j = I, this last expression becomes minus one in accordance with Theo- 
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Examination of the expression, ax — {A — A^/N), reveals that for Pi(X) it is 
always negative, while for P 2 (X) it is always positive. Both Pi(X) and P^iX) 
approach the binomial distribution when C and D become large in a fixed ratio. 
Pi{X) becomes P-(X) when C = A and 7) = — A. As C and D become 

larger, the distribution becomes flatter until in the limit it is the binomial dis- 
tribution. P 2 {X) becomes the rectangular distribution, P{X) = 1/{N + 1), 
when (7 = D = 0, and becomes sharper as C and D increase. 

The two distribution functions will not serve to approximate ?7-shaped dis- 
tributions, and Pi{X) has the disadvantage that C and D must be integers when 
they are less than N if negative probabilities are to be avoided, but since C + D 
will be greater than or equal to N in any case, and much greater than N in most 
cases, this is not a serious limitation. The two functions are reproduced when 
the marginal distributions for samples are computed : 

Pi(x.) = E P(xx , • • • , X.- 1 Z)Px(X) 

(33) 

P 2 (x.) = E P(xi , • • • , X.- 1 X)P,{X) 

X.n, 

= {^J (C + xO^*‘’(D + ni - + D + ni+ 

This is a most valuable property for two reasons. In the first place, it will 
appreciably facilitate the tedious machine calculations necessary in the work of 
providing specific optimum sampling plans. In the second place, it will simplify 
the study of the population distribution of lots by means of samples from those 
lots. 

These two distributions should, then, provide an adequate basis for the 
formulation of sampling inspection plans in most circumstances. 

6. Efficiency of sampling inspection. There are two aspects to the efficiency 
of an item inspection plain the inspection aspect, which would be measured by 
the proportion of defective items eliminated, and the sampling aspect, which 
would be measured by the difference between the proportions of defective and 
good items examined. These two measures are primarily functions of the 
amount of inspection; the former will be large when the amount of inspection is 
large, and the latter will ordinarily be large when the amount of inspection is 
small. They will not, therefore, serve as useful criteria for excellence. The 
measure to be used here is: 

(35) E = Rb — Ra 

where 72/? is the proportion of defective items examined, and Rais the proportion 
of good items examined. It '.will be zero when the inspection plan is not at all 
selective, and will be 100% when all of the defective items and none of the good 
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items are examined. It measures both aspects mentioned above, but has the 
disadvantage that it emphasizes one or the other for different amounts of in- 
spection. It is not, therrfore, a particularly good measure of efficiency, but it is 
a good criterion. It should ordinarily be maximized. 

For single sampling with mi acceptance number, a, and with a population dis- 
tribution sharper than tilie binomial, the number of items inspected on the 
average per lot is: 

(36) I =^ 11 + (N -n)'£ P(x) 

0 

and the number of defective items inspected on the average per lot is: 

(37) B = E(x) + f:t,(X - x)P(x, X) 

0 0 


The efficiency will be: 

(38) E = B/A- (I - B)/{N - A) 


which may be put in the form: 


(39) 


E =* 


N{lf - n) 
A{N - A) 



after substituting (36) and (37). This may be further simplified to : 

•<«> ® 

where Pm(x) is the marginal distribution of x for samples of size m. For dis- 
tributions flatter than the binomial, the limits of the summations on x would 
be a -H 1 to n throughout, instead of 0 to a. 

Theobem 9. For a fixed value of n, the acceptance number which maximizes E 
is a = E(x) when X is di^ributed by Pi(X) or Pa(X). 

The expression in the brackets of (40) becomes: 


(41) 


E(x) - X 
C + Z) - n 


Pn(x) 


when (33) is substituted for P(x), and becomes: 


(42) 


X — E(x) 

C H” Z) -|“ n "h 2 


Pnix) 


when (34) is substituted for P(x). This theorem is true for a wider class of dis- 
tribution functions, P(X), but is not worth pursuing too deeply because its main 
value is in the light it tiuows on the general nature of inspection plans. It will 
be a rare case in practice when n is fixed and a is unrestricted. Some idea of the 
manner in which E depends on population distributions can be attained by com- 
puting it for some simple distributions, and by examination of equation (40). 
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E can be 100% only when all submitted items are defective, but it will 
obviously be very near 100% when the distribution is P+(X) if samples of one 
are used. However, a more reasonable maximum might be 50% which is the 
largest possible value when the distribution is rectangular (as is shown in the 
next section). As the distribution becomes sharper, the maximum efficiency 
decreases to zero when the binomial distribution is reached. As the distribution 
becomes still sharper, the efficiency increases until it again reaches 50% for the 
distribution P-(X). Thus the efficiency is limited, and, in fact, will ordinarily 
be further reduced by conditions (fixed amount of inspection, or fixed outgoing 
quality level, for example) which will not allow the unrestricted maximum 
efficiency to be used. 


7. SampUng plans for the rectangular distribution. Excluding the extreme 
distributions, P-{X) and P+{X), the distribution which provides the simplest 
illustration of some of the ideas above is the rectangular one: 

(43) P(X) = l/(Ar + 1), X = 0, 1, 2, • • • , AT, 

the mean and variance of which are: 


(44) A = N/2 

= N(N + 2)/12. 
The marginal distribution of x is: 

(45) P(x) = l/(n + 1), 
and the efficiency is: 


(46) 


p _ o - »)(« - o)(« + 1) 
N(n + l)(n + 2) ‘ 


The values of n and a 
(47) 


which maximize this expression are: 
n = \/iV’ + 2 — 2 
a = (VN + 2 - 3)/2 


whence 


(48) 



1 Vt ^VN + 2-2\ 

Vr+2A N 


or nearly 50% for large N. This plan eliminates almost 75% of the defective 
items and entails examination of about 25% of the good items. 50% of all 
items will be inspected. 

If the proportion of items to be inspected is fixed at r, then the maximization 
of E is subject to the restriction: 

(49) rN ^n+iN - n)(n - o)/(n + 1) 
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and results in; 


(60) 


n = 


-nN(2 


or for large N, 


(61) 


r) + Vn*JV*(2 - ry + N(rN + 2r - 2)(N - Nr - 1) 
N{1 - r) - 1 


n = VriV/(l — r) 
a = VKl “ 


If the average outgoing quality (as defined by Dodge and Bomig) is to be fixed 
at p (the proportion of defectives after inspection on the average), then the 
maximization of £7 is subject to the condition: 


(52) 


(AT - n)(a + 2)‘« 

^ N(n + 2)<» + (AT - n)(o + 2)(« 


and results in the relation: 


(53) (N — n)(n — a) = (o + l)(w + l)(n + 2). 

When N is large relative to 1/p, the solution of these last two equations is ap- 
proximately: 

n = 

(54) 

a = 

The same result would have been obtained had the amount of inspection been 
minimized subject to (52). 



8. Summary. Methods of sampling inspection in current use have been made 
independent of any population distribution that may exist. When production 
is statistically controlled, a population distribution may be postulated. In 
such circumstances it is to be expected that knowledge gained of the population 
by repeated sampling will be a valuable aid in specifying efficient sampling 
inspection techniques. This paper is a preliminary investigation of the relation 
of lot sampling inspection plans to population distributions. 

Lots are assumed to be drawn from a population such that there is a unique 
probability the lot will contain a specified number of defective items. It is 
shown that: 

1, The number of defective items in a sample from a lot is positively or nega- 
tively correlated with the number of defective items in the remainder of 
the lot according as the population distribution is “flatter” than or 
“sharper” than a binomial distribution. Distributions are found for which 
this correlation is plus or minus one. 
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2. If the distribution is the binomial one, the number of defective items in 
the sample is distributed independently of the number of defective items 
in the remainder of the lot. Thus a sample can furnish no basis for an 
inference concerning the remainder of the lot. 

3. The correlation between the number of defective items in the sample and 
the number of defective items in the original lot is positive. 

These results are generalized for repeated sampling of one lot. 

There is discussed a standard functional form which can ordinarily be fitted 
to population distribution functions for purposes of constructing sampling 
inspection plans. 

It is shown, for a class of distribution functions, that a single sampling plan for 
nondestructive inspection will be most eflScient in a certain sense when the 
acceptance number is equal to the expected number of defective items in the 
sample. 

Optimum single sampling plans for nondestructive inspection of lots with a 
rectangular probability distribution are determined for restricted amount of 
inspection and for restricted average outgoing quality. 
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ON CARD MATCHING 

By T, W. Anderson 

Princeton Universiiy 

1. Introduction. Several authors have discussed the probability of obtaining 
a given number of matched pairs of cards under conditions of random pairing of 
two decks of arbitrary composition. The exact expression for this probability 
(equation (6)) is ordinarily too complicated for use in computing significance 
levels. This is especially true for certain practical applications. For example, 
in a square two-way contingency table in which the categories corresponding to 
rows are identical with those for columns, the sum of the entries in the diagonal 
cells has this distribution. Intuitively one would suspect that the distribution 
is asymptotically normal, as suggested by several authors. In the following 
section proof is given that the number of matched cards is asymptotically 
normally distributed when the total number of cards in each of the two decks 
approaches infinity with the proportion of cards in each suit of each deck remain- 
ing fixed. The forpi of the limiting distribution can then be used in computing 
approximate significance levels. 

A problem of some interest to psychologists is that of determining whether an 
individual has matched two series of items better than could have been done ‘‘by 
chance’’; for instance, whether a graphologist has matched personality descrip- 
tions with specimens of handwriting better than by chance. The problem can 
also be phrased in terms of card matching under random pairing of two identical 
decks each of a given number of different cards. This will be recognized as a 
classical problem of probability theory: Let tickets numbered from 1 to n be 
placed in a hat. If the tickets are drawn one by one from the hat, what is the 
probability that the number of the drawing will coincide with the number drawn 
a specified number of times? It is clear how the analagous problem of matching 
cards of three or more identical decks of a given number of different cards arises 
(e.g., matching appearance, personality, and handwriting). The latter part of 
the present paper is concerned with this problem. Battin [1] has displayed a 
generating function for the probability of obtaining a given number of matched 
cards between any number of decks of arbitrary composition. Battin ’s generat- 
ing function is used to derive explicitly the probability of obtaining a specified 
number of matched cards and the moments of the distribution. 

2. The Limiting Distribution of the Number of Matched Cards. In the 
ordinary card matching problem one is interested in the number of matchings 
when two decks, say Dx and A , are paired randomly. Let Dx consist of nu , 
wu , • • • , nu cards of suits ^Si , 52 , • • • , 5* , respectively, and let Dt consist of 
^ 21 , n 22 , • • • , n 2 fc cards of suits 5i , 52 , • • • , 5* , respectively (any n«< can be 0), 
where 

k h 

S = Z) = n. 
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Let tij (i, J =5 1, 2, • • • , k) be the number of pairings each involving a card from 
Di of suit Si and a card from Dj of suit Sj . It is easily seen that the probability 
of a set II II under random piuring is the same as that associated with the 
entries || <<> || in a fc by A: contingency table [2] for which the row totals are fixed 
as «u , nx* , • • • , nve , and the column totals are fixed as nji , njj , • • • , nj* , i .e. 

k k 

n «i<i n n&i 

(1) Pitii) = . 

»i n tiii 

t.t-1 

The probability of obtaining h matchings is the same as that of the sum of 

k 

diagonal terms in a square contingency table, i.e., h = . In fact, in prac- 

«->! 

tical cases, the problem frequently arises in this manner: If two individuals each 
classify n objects into k categories, h is the number of objects on whose classi- 
fication they agree. 

The distribution (1) has essentially (A: — 1)^ variables since there are 2& — 1 
linear restrictions imposed on the Uj . It is easy to verify that, for fixed nu/n = 
mu , say, and fixed rhi/n = m 2 , , say, the distribution (1) approaches the normal 
distribution in (fc — 1)^ linearly independent variables, as n approaches infinity. 
Let us substitute 


^7 


Ui — nmunhi 
y/n 


(i, j — 1, 2, • • • j fc), 


use Stirling's formula for each factorial in (1), and take the logarithm. The 
argument proceeds in a manner similar to the classical case of the limit of the 
binomial distribution. 

Since there are imposed linear restrictions on the Uj 


k 


X) tii = n-mu 

a = 1. 2, 


i-i 

k 

n‘mti 

i-1 

U = 1, 2, 



there are also restrictions on the , namely, 

k k 

2D ^7 ~ 2D ^7 

Hence there are (fc — 1)^ linearly independent xa . If we choose (i, j = 
1, 2, •••,fc — l)as the linearly independent variables, the limiting probability 
element as n approaches infinity, is 


(2Ty 






TciPT) e 




fc-i 

n dxii, 


( 2 ) 
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where 


Q= E 




t,,-! miim2j 


is written in terms of all the Xij with the understanding that the linearly depen- 
ent variables are linear functions of the independent variables. 

Now h — EQi) is simply a linear combination of , namely, 

h 

h - E(h) = Vn X) • 

Hence, it follows that 

h - E(h) 
y/rKTh 


is asymptotically normally distributed with mean zero and variance unity. For 
large n, then, it is possible to use the normal distribution to appro:pmate signifi- 
cance levels for h. 

Of course, any other linear combination of the entries Uj is asymptotically 
normally distributed. The quantity Q in (2) can be recognized as the Pearson 
for contingency tables, and the above constitutes proof that it actually has the 
X distribution with (fc — 1)^ degrees of freedom. 


3. Matchings between three or more decks. There are instances, such as 
the classification of n objects into k categories by 3 or more individuals, in which 
one is interested in the matchings of three decks or more. For any number of 
decks one can prove in a manner exactly analagous to §2 that the distribution 
of the number of matchings is asymptotically normal. Here the demonstration 
is indicated for three decks. Let us consider three decks !)« (a = 1,2, 3) with 
nai,na 2 , • • ' , riak , cards of suits Si , /S 2 , • • • , S* , respectively. Let tgij be the 
number of triplets consisting of a card from Sg of Z)i , a card from Si of D 2 , and 
a card from Sy of Ds under random formation of triplets (i.e., laying down the 
three shufl9.ed decks side by side). 

The probability law of the set [tgn) can be derived by the consideration of the 
generating function, 

{x\y\Zi + xiyiZ2 + • • • + xiy2Zi + • • • + X2yiZi + . . . + XhykZj^^ 

= Z n («» Vi »i)tm , 

Otifi 

where the summation extends over all the partitions {^^ty} of n. The number of 
ways of deriving the set {tgij] is the coeflScient of H {xgyiZjY*’*^, namely, 

n\ 

n 

oM 

where tgn = ni^, X3 tgij = nw, and X^ tgij = nzj. 
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The total number of ways of getting the marginal totals nig , na , and is the 
coefficient of IJ in (3); that is, in 

g*i*i 

(E xgViZiY = (E x,)"(E ynH ZiY 

Ofifi 0 i i 




nl 


n\ 


n\ 




The probabihty of getting the set is the ratio of these expressions, 


(4) 



n nial-II wmI-II 
(n!)^ n tj. 

0»i*3 


Tliis formula is analagous to (1) and, indeed, reduces to (1) for uzi = n, = 0 
(j = 2, 3, • • • , fc). This is the probability associated with a three-way con- 
tingency table {k by k by k). For a contingency tabic, by Z by m, thiwS prob- 
ability would be (4) with the limits on of 0 and k; on i, 0 and 1; and on j, 0 
and m. 

For fixed values of the ratios n«i/n == niai (a == 1, 2, 3; i = 1,2, • • • , fc), say, 
the — 3A; + 2 linearly independent variates in the set are asymptotically 
normally distributed. To demonstrate this, substitute 


Xgij 


tgij — nmigmamsi 

V n 


{9> h 3 — 1. 2, ,k) 


into (4) and use Stirling’s approximation. There are 34 — 2 independent linear 
restrictions on the x,,-,-, namely, 

k k k 

»iy— 1 a.y— 1 1 

Therefore, there are 4* — 34 + 2 x’s which are unrestricted. Using these vari- 
ables, we find that the limiting probability element of these x,,y is 


(5) 


1 

(IX mi, n n 


c n , 


where 

g= £ _M_, 

Qgigiwmi migm%im^ 
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and the product of differentials is of A;* — 3 A: + 2 variables. 

it 

matched triplets m, say, is the sum 22 i and we have 


<-i 


The number of 


u — E(u) 

y/n 


k 


XiU . 


u ““ E(%(^ 

From these facts it follows that 7 .=^ is asymptotically normally distributed. 

's/n 

The above results may be easily generalized. In a g-way contingency table 
with fixed marginal totals n-rriai (a = 1 , 2 , • • • , g; i = 1 , 2 , • • • , A:), the prob- 
ability of a set [tgi ;} is 

n n (n*m«<)! 

a-1 <-l 

(nl)*“‘ n 

The entries minus their respective means and divided by \/w, namely, 




— tmiginu ••• m,/ 


V" n 


are asymptotically normally distributed according to 

where 


* 

Q ^ V 

miginti ••• mqf 

The generalization of Pearson’s x*> namely Q, has the x*-distribution with 
k* — qk + q — 1 degrees of freedom. Finally, 

k 

8 = ^ tn—i , 


the number of matched 9 -tuplets, under random formation of g-tuplets is asymp- 
totically normally distributed. 


4. Matching cards of identical decks, each of n different cards. The prob- 
ability of obtaining a given number of pairs of matched cards under random 
pairing of two identical decks each of n different cards has been derived by Chap- 
man [3] by a straightforward method and, of course, the solution of the classical 
problem mentioned in the introduction is this probability. Another technique 
involving the use of the general expression for the number of matchings of two 
decks of arbitrary composition can be easily generalized to three or more decks. 
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Before discussing this method, let us derive this general expression first by the 
use of the generating function discussed by Battin. Consider the multinomial 

{xxy\e^ + xiy2 + • • * + x^yi + x^y^e^ + • • • + Xkyne^y. 

The coefficient of • • • Xk^^yP^ • • • t/?** (where k is the number of suits; 

nu the number of cards of suit Si in the first deck; thi the number of cards of 
suit Si in the second deck; and n = 2ni< = Sn 2 *) is the number of ways the 
cards may be arranged so that there are h matchings. After expanding the 
multinomial 


[Z XiVie" + (Z a;.)(Z Vi) “ Z 


in powers of Xi and yi , taking the proper coefficient, and dividing by the total 
number of ways the cards can be arranged, one arrives at the probability law of 


A [4], 




n «wi n 

n—h 

(6) 

- ' («i)> 

Z (-1)"-*“" 
0—0 

where 



(7) 

t.-Et — 

- {7)1 





H I (^1* ^ S<) I Si 1] 


where the summation is extended over all s,- , satisfying the following conditions: 


Ss< — n — g, nu — Si > 0, Mi — Si >0, > 0 

(i = 1, 2, • . • , k). 


From (6) one can easily derive the distribution of the number of matchings 
when two identical decks of n dififerent cards are randomly paired. Let nu — 1 , 
n 2 < =» 1, and n — k. Then Tg as defined in (7) is 


T (gOV-g)l 

' ^(oioiiiMimo!)' 


nl 

g\{n - g)\ 


(glTin - g)\ 


for Si can equal 0 or 1 and there are »Cg choices of the O’s. 
probability of the number of matchings » to be 


( 8 ) 


PCf) = 


1 V 

v\U j\ ' 


Hence, we find the 


This result has been given by Chapman [3]. It is, in fact, a classical probability 
law. 

The moment generating function is 


n n— V 


m = Z‘Z 

v-O 


vlj! 


1)> 

0^ gl 
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From this expression it is easy to verify that 

E{v) = 1, vj = 1, = 1 (r < n). 


It is interesting to observe that as n approaches infinity, the moment generating 
function approaches 


(9) 


(e* - ly _ 




It therefore follows that the limiting form of the distribution is the Poisson dis- 
tribution with parameter unity, namely, 


( 10 ) 


fill* = 1 i 

x\ ex!' 


If one writes the moment generating function as 


( 11 ) 


vie) = 


(l + l 

V Vl! 2! 


+ 


9\ 



one can see that the first n powers of 6 in (9) are the same as in (11). Hence, 
the first n moments of the distribution (8) are the same as those of the Poisson 
distribution (10). In particular it is interesting to observe that in the random 
pairing of any two series, such as the serial numbers and order numbers in the 
Selective Service drawing, the expected number of matchings is exactly 1. 

In applications of this method of matching (e.g., matching individuals and 
handwriting), the experiment may be repeated several times. It would be de- 
sirable, therefore, to have the probability law of the mean of a sample. The 
exact distribution, however, is too complicated to use. It follows from the cen- 
tral limit theorem that the mean of a sample of N observations from this dis- 
tribution is asymptotically normally distributed as N co , It can also be 
shown by using the moment generating function that if the observations are from 
distributions with different n (i.e., the i-th observation from a pair of decks of 
rii cards, Ui > 2), the distribution of the mean of the sample is asymptotically 
normal. 

Now let us consider the analogue for three decks of cards. The generating 
function [1] for the number of matchings of three cards, one from each of three 
decks of arbitrary composition as defined in §3 is 

ixiyiZie’ + xiyiZi + • • • + XiytZi + • • • + x*yi«i 

+ • • • + x2y3Zifi* + • • • + xtyiiZhe*)”. 


The probability of obtaining t matched triplets found after expanding this ex- 
pression is 


Pit) = 



( 12 ) 
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where 


where 






n fn (»«< - s,)!s.il 

<-l La-1 J 


k 

Si = n - g, Sf > 0, 

t«l 

Wat — 5t > 0 (a = 1, 2, 3; i = 1, 2, • * • , k). 

To specialize (12) for the case to be considered here, namely, three identical 
decks of n different cards each, we let 

Tiai =1 (a = 1, 2, 3; i = 1, 2, • • • , A;), 

n = k. 


Then, observing that 

n = ig^M 

one finds that the probability of t matchings is 

- 1 __ 1 t ^ j)l 


(13) P(t) nU\ t=o In - t - g)\ 
The moment generating function is 


— T. 

nU\ 1=0 


(14) 


n\ e-o j-o 


n n—t 


tljt 


= 1± (/ - ly. 

w!,-o gl 


One can readily verify that 
(15) 

2 _ n* — n + 1 
n*(n — 1) 

Since both E(t) and <7-? approach 0, as n approaches infinity, by Tchebycheff’s 
inequality we can see that the probability approaches 1 that there will be no 
matched triplet as n increases without bound. As in the case of two decks, the 
result that the mean of a sample from this population is asymptotically normally 
distributed follows from the central limit theorem. 

For the general case of q identical decks each of n different cards we can gen- 
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eralize (13), (14), and (15) immediately. First, let us note that the probability 
of 8 matched cards for q decks of arbitrary composition is 

U f n~« / \ / \ 

5 (:)( 


where 


n = ZT 


(g!)«(n - g)l 


n [n (»«• - 


where 


^Si = n - g, Si>0, 


<-1 


(n„i - s<) > 0 

(a = 1, 2, • • • , g; i = 1, 2, • • • , A;). 

The probability of w, the number of matchings when each of the g decks consists 
of n different cards, is 

PM = _i_ V (-iy[(n-u>-y)ir 

The moment generating function is 

lH-2 


1 [(«-?)!)* / J _ 1\» 


Finally, the mean and variance are 

1 




r r\n - - (n - 1) 


\t-i 


6. Summary. Two distinct problems associated with card matching have 
been considered in this paper. In the first place it has been shown that the dis- 
tribution of the number of matched pairs obtained under conditions of random 
pairing of two decks of arbitrary composition is asymptotically normal when 
the number of cards in each deck approaches infinity and the proportion of cards 
in each suit remains fixed. This demonstration was extended to the cases of 
matchings between three or more decks. The second problem treated in the 
present paper is concerned with the matchings between identical decks, each of 
n different cards. The probability law for the case of two decks was derived by 
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the use of a generating function. When n approaches infinity the limiting 
distribution was shown to be Poisson. The case of three or more decks was 
treated in similar manner, with the probability law and the moments given. 
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NOTES 

This section is devoted to brief research and expository articles y notes on methodology 
and other short items. 


THE DETECTION OF DEFECTIVE MEMBERS OF LARGE 

POPULATIONS 

By Robert Dorfman 
Washington, D. C. 

The inspection of the individual members of a large population is an expensive 
and tedious process. Often in testing the results of manufacture the work can 
be reduced greatly by examining only a sample of the population and rejecting 
the whole if the proportion of defectives in the sample is unduly large. In many 
inspections, however, the objective is to eliminate all the defective members of 
the population. This situation arises in manufacturing processes where the 
defect being tested for can result in disastrous failures. It also arises in certain 
inspections of human populations. Where the objective is to weed out indi- 
vidual defective units, a sample inspection will clearly not suffice. It will be 
shown in this paper that a different statistical approach can, under certain con- 
ditions, yield significant savings in effort and expense when a complete elimina- 
tion of defective units is desired. 

It should be noted at the outset that when large populations are being in- 
spected the objective of eliminating all units with a particular defect can never 
be fully attained. Mechanical and- chemical failures and, especially, man- 
failures make it inevitable that mistakes will occur when many units are being 
examined. Although the procedure described in this paper does not directly 
attack the problem of technical and psychological fallibility, it may contribute 
to its partial solution by reducing the tediousness of the work and by making 
more elaborate and more sensitive inspections economically feasible. In the 
following discussion no attention will be paid to the possibility of technical 
failure or operators’ error. 

The method will be described by showing its application to a large-scale pro- 
ject on which the United States Public Health Service and the Selective Service 
System are now engaged. The object of the program is to weed out all syphilitic 
men called up for induction. Under this program each prospective inductee is 
subjected to a “Wasserman-type” blood test. The test may be divided con- 
veniently into two parts: 

1. A sample of blood is drawn from the man, 

2. The blood sample is subjected to a laboratory analysis which reveals the 
presence or absence of ‘‘syphilitic antigen.” The presence of syphilitic 
antigen is a. good indication of infection. 

When this procedure is used, N chemical analyses are required in order to 
detect all infected members of a population of size N, 

The germ of the proposed technique is revealed by the following possibility. 
Suppose that after the individual blood sera are drawn they are pooled in groups 
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of, say, five and that the groups rather than the individual sera are subjected to 
chemical analysis. If none of the five sera contributing to the pool contains 
syphilitic antigen, the pool will not contain it either and will test negative. If, 
however, one or more of the sera contain syphilitic antigen, the pool will contain 
it also and the group test will reveal its presence.^ The individuals making up 
the pool must then be retested to determine which of the members are infected. 
It is not necessary to draw a new blood sample for this purpose since sufficient 
blood for both the test and the retest can be taken at once. The chemical 
analyses require only small quantities of blood. 

Two questions arise immediately: 

1. Will the group technique require fewer chemical analyses than the indi- 
vidual techni(jue and, if so, what is the extent of the saving; and 

2. What is the most efficient size for the groups? 

Both questions are answered by a study of the probability of obtaining an 
infected group. Let 

p = the prevalence rate per hundred, that is the probability 
that a random selection will yield an infecttnl individual. 
Then 

1 — p = the probability of selecting at random an individual free 
from infection. And 

(1 — p)" = the probability of obtaining by random selection a group 
of n individuals all of whom are free from infection. 
Then 

p' = 1 — (1 — p)" = the probability of obtaining by random selection a group 
of n with at least one infected member. 

Further 

N/ii = the number of groups of size ri in a population of size AT, 
so 

p'N/n = the expected number of infected groups of n in a popu- 
lation of N with a prevalence rate of p. 

The expected number of chemical analyses required by the grouping pro- 
cedure would be 

E{T) = N/n + n(N/n)p' 

or the number of groups plus the number of individuals in groups which reiiuire 
retesting.^ The ratio of the number of tests required by the group technique to 
the number required by the individual technique is a measure of its expected 
relative cost. It is given by: 

C = T/N =» 1/n + p' 

• a-.r 

^ Diagnostic tests for syphilis are extremely sensitive and will show positive results for 
even great dilutions of antigen. 

* The variance of nNp*(l — p*) -• nN[{l — p)** — (1 — p)***]. The coefficient 

of variation of T becomes small rapidly as N increases. 
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The extent of the savings attainable by use of the group method depends on the 
group size and the prevalence rate. Figure 1 shows the shape of the relative 
cost curve for fi\’e prevalence rates ranging from .01 to .15.* For a prevalence 
rate of .01 it is clear from the chart that only 20% as many tests would be 
required by group tests Math groups of 11 than by individual testing. The at- 
tainable savings decrease as the prevalence rate increases, and for a prevalence 
rate of .15, 72% as many tests are required by the most efficient grouping as by 
individual testing. The optimum group size for a population with a known 
prevalence rate is the integral value of n which has the lowest corresponding 
value on the relative cost curve for that prevalence rate. 

TABLE I 


Optimum Group Sizes and Relative Testing Costs for Selected Prevalence Rates 


Prevalence Rate 
(per cent) 

Optimum Group 
Size 

Relative Testing 
Cost 

Percent Saving 
* Attainable 

1 

11 

20 

80 

2 

8 

27 

73 

3 

0 

33 

()7 

4 

0 

38 

02 

5 

5 

43 

57 

C) 

5 

47 

53 

7 

5 

50 

50 

8 

4 

53 

47 

9 

4 

50 

44 

10 

4 

59 

41 

12 

4 

05 

35 

13 

3 

07 

33 

15 

3 

72 

28 

20 

3 

82 

18 

25 

3 

91 

9 

30 

3 

99 

1 


Optimum group sizes and their costs relative to the cost of individual testing 
are given in Table I for selected prevalence rates. 

This table, together with the description of the group testing technique as it 
might be applied to blood tests for syphilis, reveals the tw^o conditions for the 
economical application of the technique: 

1. That the prevalence rate be sufficiently small to make worth while econo- 
mies possible; and 

* The prevalende rate of syphilis among the first million selectees and volunteers was 
.0186 for whites and .2477 for other races. Geographically, the prevalence rate for whites 
ranged from .0606 in Arizona to .0061 in Wisconsin. See Parran, Thomas and Vonderlehr, 
R. A., Plain Words about Venereal Disease^ Reynal and Hitchcock, New York. 
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2. That it be easier or more economical to obtain an observation on a group 
than on the individuals of the group se])arately. 

Where these conditions exist, it will be more economical to locate defective mem- 
bers of a population by means of group testing than by means of individual 
testing. 

The principle of group testing may be applied to situations where the interest 
centers in the degree to which an imperfection is pi’esent rather than merely in 
its presence or absence. For example, it could be applied to lots of chemicals 
where it is desired to reject all batches with more than a certain degree of im- 
purity. If n samples of a chemical are pooled and sul)jected to a single analysis, 
the degree of impurity in the pool will be the average of the imi)urities in the 

LMANAIYSC& 

HUNORiD 

BLOOD TESTS 



4 0 12 16 20 24 26 

SIZE 0F9R0UP 

Fig. 1. Economies resulting from blood testing by groups 
P.R. denotes prevalence rate 

separate samples. If the criterion were adopted that the members of a pool 
would be examined individually whenever the proportion of impurity in the pool 
is greater than 1/n-th the maximum acceptable degree of impurity, clearly no 
excessively impure batches would get by. The extent of the saving accomplished 
by this means can be computed by letting p' equal the probability that the pool 
will be impure enough to warrant retesting its constituent batches and using the 
formulas given above. The probability, p', can be calculated easily from the 
probability distribution of impurities in the separate batches. 

It is evident that this approach will produce worthwhile savings only if the 
limit of acceptability is liberally above the per cent of impurity encountered in 
the bulk of the batches. It is also evident that under this scheme many of the 
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retests will indicate that all the batches in the pool are acceptable and that the 
retesting was not really needed. The criterion for retesting can be raised above 
1/n-th the limit of acceptability at the cost of a relatively small risk of accepting 
overly impure batches. The probability of failing to detect a defective batch 
when the retest criterion is raised in this manner will depend upon the form and 
parameters of the distribution of imperfection in single batches, as well as upon 
the number of batches in the pool. No simple general solution for this problem 
has been found. 


FURTHER POINTS ON MATRIX CALCULATION AND SIMULTANEOUS 

EQUATIONS 

By Harold Hotelling 
Columbia University 

Since the publication of ‘^Some new methods in matrix calculation'^ in the 
Annals of Mathematical Statistics (March, 1943, pp. 1-34), the following relevant 
points have come to the attention of the author. 

A. T. Lonseth has improved substantially the limit of error for the efficient 
method of inverting a matrix described on p. 14. He writes: 

‘‘Your use of the ‘norm' of a matrix in the Annals paper especially interests 
me, as I was recently led to use it in solving the errors problem for infinite 
linear systems which are equivalent to Fredholm-type integral equations. 
“It is possible to replace the term p* in your inequality (7.5) by one, so that 

N{Cm - A“^) ^ N{C,)k^y{l - fc). 

To see this, one observes that from the developments on the bottom of p. 13 
it follows that (/ ~ i))“' = / + L)*, where N(D*) < k/{l - k). Then 

Coil - D)“' = Co + CoZ)* 

so that 

iV[Co(/ - Dy^] g N{Co) + NiCo) NiD*) = iV(Co){l + iV(D*)}, 

from which the result stated is seen to follow. I happen to have noticed 
this because the same thing has cropped up often in my recent work, and for 
the infinite case a bound p* is no bound at all. 

“Your paper has suggested improvements in my own proofs, for which I 
am grateful." 

Dr. Lonseth's first formula above might well be written at the bottom of p. 14 
of my article as a substitute for (7.5). It both simplifies and reduces the limit 
of error. 

A method of solving normal equations by iteration, in which trial values of 
the unknown regression coefficients were applied to the values of the predictors 
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and predictand in each of the N cases, and the results were used to improve the 
trial values, was orally suggested by John C. Flanagan in 1934. The plan in- 
volved the use of punched cards for the N substitutions in the trial regression 
equation at each stage. However, it seemed on further consideration and dis- 
cussion that this would involve an unnecessarily large amount of work, since 
other methods require only as many substitutions at each stage as the number 
of unknowns, which is always less than N and usually very much less. I believe 
that Dr. Flanagan thereupon abandoned this plan and never published it. 

Louis I. Guttman has proposed a similar method,^ and has provided a proof 
of convergence in certain cases. In a final section he shows that the method can 
be modified by applying the same type of iterations to the normal, or product- 
sum, matrix instead of to the matrix of observations. This modification avoids 
the difficulty mentioned above. It is stated that one of these methods has been 
applied to a 64- variable problem. 

The first method of section 10 of my paper for solving sets of linear equations 
is equivalent, in the case of normal equations, to the second method of Dr. 
Guttman. It is regrettable that reference to his study was omitted. 

R. D. Gordon believes that the inequalities for principal components obtained 
at the end of the paper can be improved, but his entry into the army has pre- 
vented his fully working out his ideas. Paul A. Samuelson has some new and 
as yet unpublished ideas relating to calculation of principal components. 

Merrill M. Flood, in computational procedure for the method of principal 
components,’’ Psychometrika, Vol. 5 (1940), pp. 169-172, presents a method which 
appears to have considerable value, in that the number of vector multiplications 
is relatively small. However it requires solution of a system of p — 1 linear 
equations for each latent vector determined, and also of an additional such 
system. The relative value of this and other methods may depend on the rela- 
tive costs of vector multiplication and of solving systems of linear equations. 
This in turn depends on the mechanical facilities available. 

Paul Horst’s paper, ‘‘A method for determining the coefficients of a character- 
istic equation” (Annals of Mathematical Statistics, Vol. 6 (1935), pp. 83-84) 
should have been referred to in connection with sections 11 and 12. 

On p. 23 of '^Some new methods in matrix calculation,” in the sixth line from 
the bottom, smaller should be replaced by greater. On p. 32, the last expression 
in the third line should have r] in place of . The last displayed formula on 
this page should read 

+ Wkt>l , 

(Vk^i - vitP 

and the subscript r + 1 in the next line should be ik + 1. 

^ “An iterative method for multiple correlation,*' The Prediction of Personal Adjustment, 
by Paul Horst and collaborators, Social Science Research Council, New York, 1941, pp. 
313-318. 
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NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of general interest 

Personal Items 

Dr. Paul H. Anderson is Regional Statistician with the War Production Board 
and Lecturer in Mathematics at Western Reserve University. 

Mr. Kenneth J. Arrow is a second lieutenant with the United States Army 
Air Forces. 

Assistant Professor H. M. Bacon of Stanford University has been promoted 
to an associate professorship. 

Dr. G. A. Baker of the College of Agriculture of the University of California 
has been promoted to an assistant professorship. 

Mr. Blair M. Bennett is attached to the Operations Research Section of the 
Eighth Bomber Command. 

Mr. Richard Berger is in the United States Army Air Forces. 

Mr. John L. Carlson is a lieutenant in the United States Naval Reserve. 

Mr. Edward P. Colman is a major in the Coast Artillery Corps and is stationed 
at West Point. 

Mr. William F. Elkin has been appointed Research Secretary of the Philadelphia 
Tuberculosis and Health Association. 

Professor R. A. Fisher, Galton Professor in the University of London since 
1933, has been appointed to the chair of Genetics in Cambridge University. 

. Dr. J. P. Guilford is a lieutenant colonel in the Army Air Forces. He is chief of 
the Field Research Unit, Psychological Section of the Surgeon’s Office with 
headquartere at Fort Worth. 

Dr. Edward Helley of Monmouth Junior College has been appointed Visiting 
Lecturer at the Illinois Institute of Technology. 

Dr. H. B. Mann has been appointed to an instructorship at Bard College, 
Columbia University. 

Dr. Nilan Norris is a lieutenant in the Army Air Forces, and is serving as Sta- 
tistical Officer. 

Dr. Edwin G. Olds has been granted leave by Carnegie Institute of Technology 
to act as Chief Statistical Consultant to the Industrial Processes Branch of the 
Office of Production Research and Development, War Production Board. 

Miss Ruth L. Owen has been commissioned as an ensign in the United States 
Naval Reserve. She is acting as Supply and Disbursing Officer for the Naval 
V-12 Unit at St. Lawrence University. 

Mr. Robert W. Royston is a lieutenant in the United States Naval Reserve. 

Dr. H. M. Schwartz has been appointed Assistant Professor of Mathematics at 
the University of Idaho. 

Mr. William B. Simpson is now a member of the armed forces and is stationed 
at Camp Crowder. 

Mr. Irvin Stein is an ensign in the United States Naval Reserve. 
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Mr. Milton S. Stevens is an apprentice seaman in the United States Naval 
Reserve. 

Mr. W. A. Vezeau has been promoted to the rank of Assistant Professor of 
Mathematics at the University of Detroit. 

Organization of Washington Chapter of the Institute 

Professor Harold Hotelling of Columbia University spoke at George Wash- 
ington University, November 19, 1943, under the auspices of the Institute of 
Mathematical Statistics before an audience of over 150 persons. The subject of 
his lecture was Midtivariate Staiistical Analysis, At the close of the lecture the 
Washington Chapter of the Institute was organized. A Planning Committee 
consisting of William G. Madow, Chairman, Meyer A. Girshick, and W. Ed- 
wards Deming was elected. Members of the Institute who are interested in 
being in contact with the Washington Chapter should wi*ite to William G. 
Madow^, Bureau of Agricultural Economics, Department of Agriculture. 

New Members 

The following persons have been elected to membership in the Institute: 

Belz, Asso. Prof. Maurice H. M.A. (Melbourne) Univ. of Melbourne, Carlton, N. 3, 
Victoria, Australia. 

Cap6, Bernardo G. Ph.D. (Cornell) Biometrician, Agric. Exp. Station, Rio Piedras, 
Puerto Rico, Rosario St.^ Santurce. 

Crawford, Elizabeth S. B.A. (Mundelein Coll.) Asso. Labor Market Analyst, War Man- 
power Commission. 9S5 Lincoln St.j Denver^ Colo. 

Crawford, James R. Div. Supervisor, Vega Aircraft Corp., 11626 Kiiiridge St.^ N. Holly* 
woody Calif. 

Hoffer, Prof. Irwin S. M.B.A. (Harvard) Temple Univ., Philadelphia, Pa., Willow Ave., 
Amhlefy Pa. 

Maynard, Burton 1. A.B. (Stanford) Stat. Analyst and Stat., 11211 Brookhaven Ave.y 
Los Angeles y Calif. 

Mazza, Prof. Sigfrido C. Dir,, Institute de Estadistica. Facultad de C, Economicas, 
Tristan Narvaja 1474t MontevideOy Uruguay. 

Motock, George T. M.S. (Carnegie Inst. Tech., Ohio State) Dir. of Res., Republic Steel 
Corp., Cleveland, Ohio. 

Neurath, Paul M. LL.D. (Vienna) Lecturer, Coll, of the City of New York, N. Y. 649 

w. ns St. 

Page, Warren H. B.A. (Queens Coll.) Pfc., U. S. Army, 3301 A.S.T.U. Virginia Polytech- 
nic Inst., Blacksburg, Va. 

Pearson, Prof. E. S. D.Sc. (London) University Coll., Gower St., London, W. C. 1 , Eng. 
Rock, Sibyl M. B.A. (California) Res. Asso., Consolidated Eng, Corp., 98S N. Holliston 
St.y Pasadena y Calif. 

Rule, Wayne B. M.S. (Iowa) Sr. Analysis Clerk, Metropolitan Life Ins. Co., 29 Utopian 
Ave.y SufferUy N. Y. 

Thompson, Walter H. M.S. (Iowa) Sgt., U. S. Army; Instr., Agric. Economics Dept., 
Virginia Polytechnic Inst., Blacksburg, Va. 

Thomson, Prof. Godfrey H. D.Sc. (Durham) Dir. of the Training of Teachers, Univ. of 
Edinburgh, Edinburgh, Scotland. 
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REPORT ON THE NEW BRUNSWICK MEETING OF THE INSTITUTE 

The Sixth Summer Meeting of the Institute of Mathematical Statistics was 
held at The New Jersey College for Women, Rutgers University, Sunday and 
Monday, September 12 and 13, 1943, in conjunction with the meetings of the 
American Mathematical Society and the Mathematical Association of America. 
The following fifty-tw'o members of the Institute attended the meeting: 

T. W. Anderson, H. E. Arnold, K. J. Arnold, L. A. Aroian, B. M. Bennett, E. E. Blanche, 

C. I. Bliss, A. H. Bowkcr, Hobart Bushey, W. G. Cochran, T. F. Cope, C. C. Craig, H. B. 
Curry, J. H. Curtiss, J. F. Daly, Mary Elveback, W. Feller, R. M. Foster, J. A. Greenwood, 
J. I. Griffin, C. C. Grove, F. E. Grubbs, E. J. Gumbel, Harold Hotelling, Tj ailing Koop- 
mans, H. G. Landau, Howard Levene, Simon Lopata, P. J. McCarthy, W. G. Madow, Mar- 
garet Martin, J. W. Mauchly, E. B. Mode, L. F. Nanni, 0. Oakley, P. S. Olmstead, 
F. E. Satterthwaite, Bernice Scherl, H. M. Schwartz, L. W. Shaw, J. Shohat, Blanche Ska- 
lak, Mortimer Spiegelman, Arthur Stein, H. W. Steinhaus, A. W. Tucker, J. W. Tukey, 

D. F. Votaw, Abraham Wald, S. S. Wilks, Jacob Wolfowitz, Bertram Yood. 

Professor S. S. Wilks of Princeton University acted as chairman for the Sunday 
morning session. The following papers were presented: 

1. Some New Statistical Applications of Partitioned Matrices and Iterative Methods. 
Harold Hotelling, Columbia University 

2. On the Construction of Orthogonal Latm Squares. 

Henry B. Mann, Columbia University 

Dr. Jacob Wolfowitz, Columbia University, presided at the session on Sunday 
afternoon. At this session the following papers were presented: 

1. Recent Developments in the Statistical Analysis of Problems Requiring the Use of 
Vector Variates. 

W. G. Madow, Office of Price Administration. 

2. Statistical Inference when the Form of the Distribution Function is Unknown. 

Henry Scheff^, Princeton University. 

The session on Monday morning was held jointly with the American Mathe- 
matical Society. Professor C. C. Craig, University of Michigan, acted as chair- 
man, and the following contributed papers were read: 

1. Asymptotic Distributions of Ascending and Descending Runs, 

Jacob Wolfowitz, Columbia University. 

2. On the Plotting of Statistical Observations. 

E. J. Gumbel, New School for Social Research. 

5. On a Measure-Theoretic Problem Arising in the Theory of Non-Parametric Tests. 
(Read by title.) 

Henry Scheff6, Princeton University. 

4. On a General Class of * ^Contagious** Distributions. 

Will Feller, Brown University. 

6. On the Statistical Treatment of Linear Stochastic Difference Equations. 

H. B. Mann and Abraham Wald, Columbia University. 

6. An Exact Test for Randomness in the Non-Parametric Case Based on Serial Corre* 
lotion. 

Abraham Wald and Jacob Wolfowitz, Columbia University. 
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On Saturday afternoon the members of the three societies were the guests of 
Miss Margaret Trumbull Corwin, Dean of the College, New Jersey College for 
Women, at an informal reception at the Dean’s House. On Sunday evening an 
informal buffet supper for the mathematical organizations was served at Wood 
Lawn, the Alumnae House of the New Jersey College for Women. Later the 
same evening the Department of Music presented a Musicale in the Music 
Building. 

Edwin G. Olds, 

Secretary 


REPORT ON THE SECOND MEETING OF THE PITTSBURGH 
CHAPTER OF THE INSTITUTE 

The second meeting of the Pittsburgh Chapter of the Institute of Mathemati- 
cal Statistics was held at Carnegie Union, Carnegie Institute of Technology, on 
Saturday, October 9, 1943. Thirty-four persons attended the meeting, including 
the following eight members of the Institute: 

W. O. Clinedinst, G. G. Eldredge, K. L. Fetters, H. J. Hand, G. E. Niver, 
F. G. Norris, E. G. Olds, E. M. Schrock. 

At the morning session Mr. Charles E. Young, Westinghouse Electric and 
Manufacturing Company, presented a paper entitled ^^Analysis of Cyclical Fluc- 
tuations.” The program for the afternoon session consisted of a paper entitled 
“Use of orthogonal coordinates in linear regression,” presented by Mr. W. O. 
Clinedinst, National Tube Company. Mr. F. G. Norris, President of the Pitts- 
burgh Chapter, acted as chairman for both sessions. 

Howard Hand, 

Secretary of the Pittsburgh Chapter 


ABSTRACTS OF PAPERS 

(Presented Monday, September 13, 1943, at the New Brunswick Meeting 

of the Institute) 

As3rmptotic Distributions of Ascending and Descending Rtuis. Jacob Wolfo- 
wiTZ, Columbia University. 

Let ai , 02 , • • • , Oiv be any permutation of N unequal numbers. Let there be assigned to 
each permutation the same probability. An element o<(l <i<N) is called a turning point 
if a. is greater than or less than both Oi-i and o,+j . Let aj and o/+jb bo consecutive turning 
points; they are said to determine a “run'* of length k. The author obtains the asymptotic 
distributions of a large class of functions of these runs. An example of his results is the 
following: It is proved that the following are asymptotically normally distributed: (a) 
the total number of runs; (b) i?(p), the number of runs of length p; (c) R(p) and R(q) 
jointly. Similar results are obtained for runs defined by any of a large set of criteria, of 
which the one given above is of value in statistical applications. 
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On the Plotting of Statistical Observations. E. J. G umbel, The New School 
for Social Research. 

It is well known that there exist two step functions corresponding to a continuous vari- 
ate. We may attribute to the m-th observation the ranks m or w — 1 . To obtain one and 
only one serial number rw, which will, in general, not be integer, wc attribute to Xm an ad- 
justed frequency m — namely the probability of the most probable 7/i-th value. The 
correction A for the rank thus introduced depends upon the distribution. If the variate 
is unlimited and possesses a mode, A increases for increasing values of the variate from zero 
up to unity. The correction is important for small numbers of observations. For large 
numbers of observations and for the ogive it is sufficient to choose A = J. The calculation 
of A allows a correct plotting of all observations (including the first and last) on probability 
paper (equiprobability test). For the return periods, the ranks rn and m — 1 correspond 
to the observed exceedance and recurrence intervals. The correction A leads to adjusted 
return periods which pass for increasing values of the variate from the exceedance to the 
recurrence intervals, provided the variate is unlimited and possesses a single mode. The 
asymptotic standard error of the partition values may be used to construct confidence bands 
for the ogive, the equiprobability test, and the return periods. This control for the fit 
between theory and observation may be applied to all observations which are not extreme. 

On a Measure-Theoretic Problem Arising in the Theory of Non-Parametric 
Tests* Henry Scheff£, Princeton University. 

I^et F{x) be the cumulative distribution function of a univariate population. Denote a 
sample from the population by the sample point, E ^ (a^i , 2^2 , * • * ,Xk) and let w be a Borel 
region in the sample space. How can wc characterize w in order that Pr{E in io\ be inde- 
pendent of F{x) for all F in a given class of distribution functions? For various classes of 
f necessary conditions and sufficient conditions arc found. For example, if the boundary 
of is a null set, a necessary and sufficient condition for tv to have the desired property for 
all absolutely continuous F(x) is that it have the following structure except on a null set: 
For every point E in the samph* space, M of the kl points obtained by permuting the co- 
ordinates of E are in w and the remaining A:I — M are not {0 < M < k\). 

On a General Class of “Contagious” Distributions. W. Feller, Brown Uni- 
versity. 

This paper is concerned with some properties of a class of contagious distributions which 
contains, among others, some distributions studied by Greenwood and Yule, Polya, and 
Neyman, respectively. 

On the Statistical Treatment of Linear Stochastic Difference Equations. 

H. B. Mann and A. Wald, Columbia University. 


For any integer Het , * • • , be a set of r random variables which satisfy the system 

of linear stochastic difference equations ^ ^ ottjkXj.i^k -f a* =* e,* (i = 1, • • • , r). The 

coefficients a,,* and ai are (known or unknown) constants and the vectors («u , • * * , trt) 
(f » 1, 2, • • • , ad inf.) are independently distributed random vectors each having the same 
distribution. It is assumed that E^tn) = 0. The problem dealt with in this paper is to 
estimate the unknown coefficients aijk and a< on the basis of Nr observations Xtt{i = 1, • * * , 
r; i =* 1, • • • , A). The statistics used as estimates of the unknown coefficients are identical 
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with the maximum likelihood estimates if et is normally distributed. The joint limiting 
distribution of these estimates is obtained without assuming normality of the distribution 
of €t. 

An Exact Test for Randomness in the Non-Parametric Case Based on Serial 
Correlation. A. Wald and J. Wolfowitz, Columbia University. 

Let Xi j , Xn be n chance variables, about the distribution of which nothing is known. 
Let the problem be to test the (null) hypothesis that Xi , • • • , X„ are independently dis- 
tributed with the same distribution function. It is shown that an exact test of this hypoth- 
esis based on the serial correlation coefficient can be made. For this purpose the distri- 
bution of the serial correlation coefficient in the sub-population consisting of all possible 
permutations of the observed values is employed. Under the null hypothesis, this distribu- 
tion is independent of the distribution function oi Xt(i = 1, • • • , ?i). Several exact mo- 
ments arc obtained and asymptotic normality is proved. 
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Bassford, Blackadar, Boozer, Boschan, 
Bowker, Brookner, Burgess, Bushey, 
Court, Daly, Dodge, Edwards, Eisen- 
hart, Elveback, Fertig, Fry, Goode, 
R. D. Gordon, Gumbel, Haavelmo, 
Hilfer, Jablon, Levene, Levine, Levy, 
Lew, Li, Lorge, Lotka, Martin, P. J. 
McCarthy, Neurath, Noether, N. Nor- 
ris, Paulson, Peterson, Preinreich, 
Preston, Ratkowitz, Riordan, W. S. 
Robinson, Romig, Roos, Schapiro, 
Scherl, L. G. Simon, Skalak, Spiegel- 
man, Steinhaus, Stevens, M. N. Torrey, 
Wald, Walker, Wallis, Wilkinson, Wol- 
fowitz, Zeiger, Zubin. 

Parkchester. Sternhell. 

Port Washington. Kimball. 

Poughkeepsie. Hopper. 

Richmond Hill. Spaney. 




